rollout: config-driven viz, wristframe revert, HPT annotation support#482
rollout: config-driven viz, wristframe revert, HPT annotation support#482rl2aloha wants to merge 5 commits into
Conversation
Read transform_list mode from .hydra/config.yaml instead of hardcoding cartesian_wristframe_ypr. The rollout now matches whatever frame the model was trained on (e.g. mode: cartesian for QWEN-HPT, mode: cartesian_wristframe_ypr for pi05_eva_aria_obj_gen_lang_wristframe). Add config-driven prediction visualization. _build_viz_func_from_config reads evaluator.viz_func from .hydra/config.yaml and uses hydra.utils.instantiate to bind image_key / action_key / mode. A runtime 'v' toggle in the intervention menu enables/disables saving viz_<step>.png to debug/ on every inference. The live camera image is resized to 640x480 inside _save_viz so the projection matches ARIA_INTRINSICS (the Aria camera publishes at 960x720 per configs.yaml, but the intrinsics in INTRINSICS['base'] are calibrated for 640x480). cv2.imwrite gets BGR conversion to avoid the inverted-color save bug. Add wrist-frame revert. _build_revert_transform_from_config reads evaluator.transform_lists.<embodiment> from config and instantiates the revert transform that converts wrist-frame predictions back to cam frame using the current obs ee_pose as the reference. rollout_step applies this to preds and the GT batch BEFORE both the viz call and the cam_frame_to_base_frame post-processing. Wrist-frame models now execute correctly on the robot AND project to the right pixels. Cam-frame models (revert is None) skip the revert as a no-op, so QWEN-HPT and old PI cartesian checkpoints continue to work unchanged. Add HPT (QWEN) annotation support. _apply_annotation_to_algo now sets annotation_sampling_mode='first' in addition to PI's sampling_mode='first', so --annotation-path works for both algos without subclass-specific code paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
Add 'm' intervention command (independent of 'v') that saves the prediction viz at the model's input resolution — 224x224 with padding mirroring resize_with_pad_torch — written to debug/viz_model_<step>.png. Intrinsics are scaled-and-padded to match (base intrinsics calibrated for 640x480 are first scaled to the live camera's resolution, then by the resize ratio, with padding added to cx/cy) so projections still land on the correct pixels. Pred is hidden (pred_alpha=0.0) so the saved image shows what the model actually sees, with only the green GT cluster overlaid. Fix the 'annotations' KeyError that crashed both viz modes when no --annotation-path was supplied. The viz_func partial bakes in annotation_key from the config, so viz_gt_preds does an unconditional batch[annotation_key] lookup. _build_viz_func_from_config now captures that key, and both _save_viz / _save_viz_model_res inject a default (self.annotation or empty string) into the batch before calling viz_func. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cherry-picked file-scoped from commit 2c53ef8 ("6D pi data transform"): - egomimic/utils/pose_utils.py: adds _ypr_to_rot6d / _rot6d_to_ypr helpers - egomimic/rldb/zarr/action_chunk_transforms.py: adds CartesianYPRToRot6D and CartesianRot6DToYPR transform classes (xyz+ypr <-> xyz+rot6d per arm, 14<->20 or 12<->18 dims; numpy/tensor passthrough) - egomimic/rldb/embodiment/eva.py: adds 'cartesian_6d' and 'cartesian_wristframe_6d' modes to Eva.get_transform_list, plus the two cam-frame revert builders the evaluator config references (_build_eva_cartesian_revert_6d_transform_list, _build_eva_cartesian_revert_6d_wristframe_transform_list). Unblocks rollout of pi05_eva_aria_wristframe_6d checkpoints, which the rollout branch was crashing on with 'NoneType is not iterable' (unknown mode falling through Eva.get_transform_list) and 'Error locating target ... _build_eva_cartesian_revert_6d_wristframe_transform_list' (missing revert builder). Skipped the training-side changes in 2c53ef8 (pi.py, hydra configs, action_utils, human.py, test files) — not needed for rollout and would conflict with rollout-branch edits to pi.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…on format Followup to 491496f (which only brought the data-side eva.py / transforms / pose_utils). The 6D checkpoint outputs 20-dim per-arm actions (xyz + rot6d + gripper) but RobotBimanualCartesianEuler.to32 still expected 14-dim, causing: ValueError: RobotBimanual: expected 14-dim, got 20 File-checkout from 2c53ef8 brings in: - The full raw-rotation refactor that landed on aidan/pi-6d before 2c53ef8 (1ddf896, 53e69ea, 59729c4) — PI05_CARTESIAN_ACTION_ENCODING_{RAW_ROT_6D, LEGACY, NORM_ROT_6D} constants, to32_raw_rotation/from32_raw_rotation, and raw/normalized rotation pipeline in PI.forward_eval / PI._unnormalize_action. - The new to32_norm_6d / from32_norm_6d converter methods on RobotBimanualCartesianEuler and HumanBimanualCartesianEuler. - The NORM_ROT_6D dispatch branches in PI.forward_eval (extract 6D, then unnormalize) and PI's action-encoding setup. Safe checkout: HEAD's pi.py and action_utils.py were byte-identical to the merge-base with 2c53ef8, so file-checkout doesn't overwrite any rollout-branch edits to these files. Purely EgoVerse-side; no openpi changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Got rolled back by an earlier gt sync. The 20GB pi05 checkpoint OOM-killed the rollout between the patch-save and the main ModelWrapper.load_from_checkpoint: both held the full checkpoint simultaneously (~40GB peak on a host with less). Two mitigations: - If <ckpt>.patched already exists on disk, return its path immediately and skip both torch.load and torch.save. The first run still pays the double-load cost; every subsequent run is single-load. - del + gc.collect() between _patch_checkpoint_paths and the main load so the patching checkpoint is freed before the load load runs even on first-launch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Claude Code ReviewReview of PR #482SummaryAdds config-driven rollout behavior (transform_list mode, viz_func, revert transform) plus three pi0.5 action encoding modes (legacy, raw-rot6d, normalized-rot6d) with supporting YPR↔6D converters. The scope is broader than the title suggests — there's a substantial new training-side action encoding pathway bundled in. Key concerns
Suggestions
Verdict: Request ChangesPrimarily because of the scope (please split) and the absence of tests for the new action encoding code, which touches training. The rollout-only pieces look solid and I'd approve them on their own. Reviewed by Claude · Review workflow |

Read transform_list mode from .hydra/config.yaml instead of hardcoding
cartesian_wristframe_ypr. The rollout now matches whatever frame the
model was trained on (e.g. mode: cartesian for QWEN-HPT,
mode: cartesian_wristframe_ypr for pi05_eva_aria_obj_gen_lang_wristframe).
Add config-driven prediction visualization. build_viz_func_from_config
reads evaluator.viz_func from .hydra/config.yaml and uses hydra.utils.instantiate
to bind image_key / action_key / mode. A runtime 'v' toggle in the
intervention menu enables/disables saving viz.png to debug/
on every inference. The live camera image is resized to 640x480
inside _save_viz so the projection matches ARIA_INTRINSICS (the
Aria camera publishes at 960x720 per configs.yaml, but the intrinsics
in INTRINSICS['base'] are calibrated for 640x480). cv2.imwrite gets
BGR conversion to avoid the inverted-color save bug.
Add wrist-frame revert. _build_revert_transform_from_config reads
evaluator.transform_lists. from config and instantiates
the revert transform that converts wrist-frame predictions back to
cam frame using the current obs ee_pose as the reference. rollout_step
applies this to preds and the GT batch BEFORE both the viz call and
the cam_frame_to_base_frame post-processing. Wrist-frame models now
execute correctly on the robot AND project to the right pixels.
Cam-frame models (revert is None) skip the revert as a no-op, so
QWEN-HPT and old PI cartesian checkpoints continue to work unchanged.
Add HPT (QWEN) annotation support. _apply_annotation_to_algo now
sets annotation_sampling_mode='first' in addition to PI's
sampling_mode='first', so --annotation-path works for both algos
without subclass-specific code paths.
Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com