rollout: config-driven viz, wristframe revert, HPT annotation support by rl2aloha · Pull Request #482 · GaTech-RL2/EgoVerse

rl2aloha · 2026-06-02T02:10:34Z

Read transform_list mode from .hydra/config.yaml instead of hardcoding
cartesian_wristframe_ypr. The rollout now matches whatever frame the
model was trained on (e.g. mode: cartesian for QWEN-HPT,
mode: cartesian_wristframe_ypr for pi05_eva_aria_obj_gen_lang_wristframe).

Add config-driven prediction visualization. build_viz_func_from_config
reads evaluator.viz_func from .hydra/config.yaml and uses hydra.utils.instantiate
to bind image_key / action_key / mode. A runtime 'v' toggle in the
intervention menu enables/disables saving viz.png to debug/
on every inference. The live camera image is resized to 640x480
inside _save_viz so the projection matches ARIA_INTRINSICS (the
Aria camera publishes at 960x720 per configs.yaml, but the intrinsics
in INTRINSICS['base'] are calibrated for 640x480). cv2.imwrite gets
BGR conversion to avoid the inverted-color save bug.

Add wrist-frame revert. _build_revert_transform_from_config reads
evaluator.transform_lists. from config and instantiates
the revert transform that converts wrist-frame predictions back to
cam frame using the current obs ee_pose as the reference. rollout_step
applies this to preds and the GT batch BEFORE both the viz call and
the cam_frame_to_base_frame post-processing. Wrist-frame models now
execute correctly on the robot AND project to the right pixels.
Cam-frame models (revert is None) skip the revert as a no-op, so
QWEN-HPT and old PI cartesian checkpoints continue to work unchanged.

Add HPT (QWEN) annotation support. _apply_annotation_to_algo now
sets annotation_sampling_mode='first' in addition to PI's
sampling_mode='first', so --annotation-path works for both algos
without subclass-specific code paths.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

Read transform_list mode from .hydra/config.yaml instead of hardcoding cartesian_wristframe_ypr. The rollout now matches whatever frame the model was trained on (e.g. mode: cartesian for QWEN-HPT, mode: cartesian_wristframe_ypr for pi05_eva_aria_obj_gen_lang_wristframe). Add config-driven prediction visualization. _build_viz_func_from_config reads evaluator.viz_func from .hydra/config.yaml and uses hydra.utils.instantiate to bind image_key / action_key / mode. A runtime 'v' toggle in the intervention menu enables/disables saving viz_<step>.png to debug/ on every inference. The live camera image is resized to 640x480 inside _save_viz so the projection matches ARIA_INTRINSICS (the Aria camera publishes at 960x720 per configs.yaml, but the intrinsics in INTRINSICS['base'] are calibrated for 640x480). cv2.imwrite gets BGR conversion to avoid the inverted-color save bug. Add wrist-frame revert. _build_revert_transform_from_config reads evaluator.transform_lists.<embodiment> from config and instantiates the revert transform that converts wrist-frame predictions back to cam frame using the current obs ee_pose as the reference. rollout_step applies this to preds and the GT batch BEFORE both the viz call and the cam_frame_to_base_frame post-processing. Wrist-frame models now execute correctly on the robot AND project to the right pixels. Cam-frame models (revert is None) skip the revert as a no-op, so QWEN-HPT and old PI cartesian checkpoints continue to work unchanged. Add HPT (QWEN) annotation support. _apply_annotation_to_algo now sets annotation_sampling_mode='first' in addition to PI's sampling_mode='first', so --annotation-path works for both algos without subclass-specific code paths. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

rl2aloha · 2026-06-02T02:10:45Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

Add 'm' intervention command (independent of 'v') that saves the prediction viz at the model's input resolution — 224x224 with padding mirroring resize_with_pad_torch — written to debug/viz_model_<step>.png. Intrinsics are scaled-and-padded to match (base intrinsics calibrated for 640x480 are first scaled to the live camera's resolution, then by the resize ratio, with padding added to cx/cy) so projections still land on the correct pixels. Pred is hidden (pred_alpha=0.0) so the saved image shows what the model actually sees, with only the green GT cluster overlaid. Fix the 'annotations' KeyError that crashed both viz modes when no --annotation-path was supplied. The viz_func partial bakes in annotation_key from the config, so viz_gt_preds does an unconditional batch[annotation_key] lookup. _build_viz_func_from_config now captures that key, and both _save_viz / _save_viz_model_res inject a default (self.annotation or empty string) into the batch before calling viz_func. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Cherry-picked file-scoped from commit 2c53ef8 ("6D pi data transform"): - egomimic/utils/pose_utils.py: adds _ypr_to_rot6d / _rot6d_to_ypr helpers - egomimic/rldb/zarr/action_chunk_transforms.py: adds CartesianYPRToRot6D and CartesianRot6DToYPR transform classes (xyz+ypr <-> xyz+rot6d per arm, 14<->20 or 12<->18 dims; numpy/tensor passthrough) - egomimic/rldb/embodiment/eva.py: adds 'cartesian_6d' and 'cartesian_wristframe_6d' modes to Eva.get_transform_list, plus the two cam-frame revert builders the evaluator config references (_build_eva_cartesian_revert_6d_transform_list, _build_eva_cartesian_revert_6d_wristframe_transform_list). Unblocks rollout of pi05_eva_aria_wristframe_6d checkpoints, which the rollout branch was crashing on with 'NoneType is not iterable' (unknown mode falling through Eva.get_transform_list) and 'Error locating target ... _build_eva_cartesian_revert_6d_wristframe_transform_list' (missing revert builder). Skipped the training-side changes in 2c53ef8 (pi.py, hydra configs, action_utils, human.py, test files) — not needed for rollout and would conflict with rollout-branch edits to pi.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…on format Followup to 491496f (which only brought the data-side eva.py / transforms / pose_utils). The 6D checkpoint outputs 20-dim per-arm actions (xyz + rot6d + gripper) but RobotBimanualCartesianEuler.to32 still expected 14-dim, causing: ValueError: RobotBimanual: expected 14-dim, got 20 File-checkout from 2c53ef8 brings in: - The full raw-rotation refactor that landed on aidan/pi-6d before 2c53ef8 (1ddf896, 53e69ea, 59729c4) — PI05_CARTESIAN_ACTION_ENCODING_{RAW_ROT_6D, LEGACY, NORM_ROT_6D} constants, to32_raw_rotation/from32_raw_rotation, and raw/normalized rotation pipeline in PI.forward_eval / PI._unnormalize_action. - The new to32_norm_6d / from32_norm_6d converter methods on RobotBimanualCartesianEuler and HumanBimanualCartesianEuler. - The NORM_ROT_6D dispatch branches in PI.forward_eval (extract 6D, then unnormalize) and PI's action-encoding setup. Safe checkout: HEAD's pi.py and action_utils.py were byte-identical to the merge-base with 2c53ef8, so file-checkout doesn't overwrite any rollout-branch edits to these files. Purely EgoVerse-side; no openpi changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Got rolled back by an earlier gt sync. The 20GB pi05 checkpoint OOM-killed the rollout between the patch-save and the main ModelWrapper.load_from_checkpoint: both held the full checkpoint simultaneously (~40GB peak on a host with less). Two mitigations: - If <ckpt>.patched already exists on disk, return its path immediately and skip both torch.load and torch.save. The first run still pays the double-load cost; every subsequent run is single-load. - del + gc.collect() between _patch_checkpoint_paths and the main load so the patching checkpoint is freed before the load load runs even on first-launch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-06-15T21:12:46Z

Claude Code Review

Review of PR #482

Summary

Adds config-driven rollout behavior (transform_list mode, viz_func, revert transform) plus three pi0.5 action encoding modes (legacy, raw-rot6d, normalized-rot6d) with supporting YPR↔6D converters. The scope is broader than the title suggests — there's a substantial new training-side action encoding pathway bundled in.

Key concerns

Scope creep — title vs. diff. The PR title is "rollout: config-driven viz, wristframe revert, HPT annotation support" but ~60% of the diff is a new pi0.5 action encoding system (PI05_CARTESIAN_ACTION_ENCODING_*, to32_raw_rotation, to32_norm_6d, CartesianYPRToRot6D, new cartesian_6d modes in Eva). This is a training-path change that affects norm stats and checkpoint compatibility, and it deserves its own PR with its own review. Please split.
Default action_encoding is LEGACY — but does every existing trained checkpoint set this field? The new action_encoding kwarg defaults to PI05_CARTESIAN_ACTION_ENCODING_LEGACY, which preserves old behavior at construction time. However, Hydra-instantiated PI configs that don't include action_encoding will silently get LEGACY. Confirm no in-flight configs were written assuming the new path is the default.
Checkpoint patching memory optimization changes return contract silently. _patch_checkpoint_paths now returns (patched_path, None) when reusing an existing .patched file or after patching. Previously it returned (path, cfg) on the no-patch path. The only caller (_load_policy) discards cfg, so this is fine today, but the docstring/return type isn't updated. Also: the cached .patched file is reused without validating it matches the current ckpt_path — if someone re-trains and overwrites the original ckpt but the .patched sidecar is stale, you'll silently load old weights. Add an mtime check or hash.
YAML config parsing is brittle. _build_transform_list_from_config, _build_viz_func_from_config, and _build_revert_transform_from_config each independently re-open .hydra/config.yaml, walk into data.train_datasets.eva_*, and pattern-match on _target_ substrings ("Eva.get_transform_list" in target). This will break the day someone renames or refactors the Eva builder. Consolidate into one config-load helper and prefer structural checks over substring matching on _target_.
_apply_revert_to_actions runs the revert on GT and preds via Embodiment.apply_transform — confirm that this transform pipeline doesn't mutate observations.state.ee_pose in place. The wrist-frame revert reads ee_pose as the reference; if the same dict is passed twice (once for preds, once for GT) and ee_pose gets normalized/transformed on the first pass, the second pass uses corrupted data. Worth a unit test.
INTRINSICS["base"] monkey-patch in _save_viz_model_res is not exception-safe across threads. It's wrapped in try/finally (good), but if viz_func is ever called concurrently (e.g. async logging), the global swap races. Document that viz_func is single-threaded or pass intrinsics through.
No tests. This adds ~500 lines of action encoding logic with hand-rolled normalization (_apply_norm_one, _apply_unnorm_one) duplicating logic that presumably lives in norm_stats.unnormalize. There must be a round-trip test: from32(to32(x)) == x and from32_raw_rotation(to32_raw_rotation(x, stats), stats, unnormalize_non_rotation=True) == x for both Robot and Human converters across all three encodings. Same for _ypr_to_rot6d ↔ _rot6d_to_ypr.
Hardcoded ARIA 640×480 intrinsics assumption. _save_viz resizes to (480, 640) unconditionally because INTRINSICS["base"] is calibrated for Aria at that resolution. If anyone runs this rollout with a non-Aria embodiment or different intrinsics, the viz silently lies. Gate this on embodiment or read the target resolution from INTRINSICS["base"] (you already compute cx*2, cy*2 for that purpose in _save_viz_model_res).

Suggestions

Split this PR. Rollout changes → one PR. Action encoding + 6D transforms + Eva mode additions → separate PR with norm-stat regression tests.
Add unit tests for the YPR↔6D round trip and all four to32*/from32* round trips per converter.
Add an mtime/hash check when reusing the cached .patched checkpoint.
Replace substring matching on _target_ with a documented schema or a single shared parser.
Confirm Embodiment.apply_transform is non-mutating on ee_pose, or deep-copy before the second call.
Update the _patch_checkpoint_paths docstring to reflect the new (path, None) return.
Consider extracting the three _build_*_from_config methods' YAML-loading boilerplate into one helper.
The viz debug print block (_viz_debug_printed) is ~30 lines of inline f-strings — fine for a research codebase but consider moving behind a --debug flag.

Verdict: Request Changes

Primarily because of the scope (please split) and the absence of tests for the new action encoding code, which touches training. The rollout-only pieces look solid and I'd approve them on their own.

Reviewed by Claude · Review workflow

rl2aloha and others added 4 commits June 2, 2026 02:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rollout: config-driven viz, wristframe revert, HPT annotation support#482

rollout: config-driven viz, wristframe revert, HPT annotation support#482
rl2aloha wants to merge 5 commits into
elmo/rollout-proprio-norm-fixfrom
elmo/rollout-viz-and-wristframe

rl2aloha commented Jun 2, 2026

Uh oh!

rl2aloha commented Jun 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rl2aloha commented Jun 2, 2026

Uh oh!

rl2aloha commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 15, 2026

Claude Code Review

Review of PR #482

Summary

Key concerns

Suggestions

Verdict: Request Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rl2aloha commented Jun 2, 2026 •

edited

Loading