Skip to content

rollout: config-driven viz, wristframe revert, HPT annotation support#482

Open
rl2aloha wants to merge 5 commits into
elmo/rollout-proprio-norm-fixfrom
elmo/rollout-viz-and-wristframe
Open

rollout: config-driven viz, wristframe revert, HPT annotation support#482
rl2aloha wants to merge 5 commits into
elmo/rollout-proprio-norm-fixfrom
elmo/rollout-viz-and-wristframe

Conversation

@rl2aloha

@rl2aloha rl2aloha commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Read transform_list mode from .hydra/config.yaml instead of hardcoding
cartesian_wristframe_ypr. The rollout now matches whatever frame the
model was trained on (e.g. mode: cartesian for QWEN-HPT,
mode: cartesian_wristframe_ypr for pi05_eva_aria_obj_gen_lang_wristframe).

Add config-driven prediction visualization. build_viz_func_from_config
reads evaluator.viz_func from .hydra/config.yaml and uses hydra.utils.instantiate
to bind image_key / action_key / mode. A runtime 'v' toggle in the
intervention menu enables/disables saving viz
.png to debug/
on every inference. The live camera image is resized to 640x480
inside _save_viz so the projection matches ARIA_INTRINSICS (the
Aria camera publishes at 960x720 per configs.yaml, but the intrinsics
in INTRINSICS['base'] are calibrated for 640x480). cv2.imwrite gets
BGR conversion to avoid the inverted-color save bug.

Add wrist-frame revert. _build_revert_transform_from_config reads
evaluator.transform_lists. from config and instantiates
the revert transform that converts wrist-frame predictions back to
cam frame using the current obs ee_pose as the reference. rollout_step
applies this to preds and the GT batch BEFORE both the viz call and
the cam_frame_to_base_frame post-processing. Wrist-frame models now
execute correctly on the robot AND project to the right pixels.
Cam-frame models (revert is None) skip the revert as a no-op, so
QWEN-HPT and old PI cartesian checkpoints continue to work unchanged.

Add HPT (QWEN) annotation support. _apply_annotation_to_algo now
sets annotation_sampling_mode='first' in addition to PI's
sampling_mode='first', so --annotation-path works for both algos
without subclass-specific code paths.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

Read transform_list mode from .hydra/config.yaml instead of hardcoding
cartesian_wristframe_ypr. The rollout now matches whatever frame the
model was trained on (e.g. mode: cartesian for QWEN-HPT,
mode: cartesian_wristframe_ypr for pi05_eva_aria_obj_gen_lang_wristframe).

Add config-driven prediction visualization. _build_viz_func_from_config
reads evaluator.viz_func from .hydra/config.yaml and uses hydra.utils.instantiate
to bind image_key / action_key / mode. A runtime 'v' toggle in the
intervention menu enables/disables saving viz_<step>.png to debug/
on every inference. The live camera image is resized to 640x480
inside _save_viz so the projection matches ARIA_INTRINSICS (the
Aria camera publishes at 960x720 per configs.yaml, but the intrinsics
in INTRINSICS['base'] are calibrated for 640x480). cv2.imwrite gets
BGR conversion to avoid the inverted-color save bug.

Add wrist-frame revert. _build_revert_transform_from_config reads
evaluator.transform_lists.<embodiment> from config and instantiates
the revert transform that converts wrist-frame predictions back to
cam frame using the current obs ee_pose as the reference. rollout_step
applies this to preds and the GT batch BEFORE both the viz call and
the cam_frame_to_base_frame post-processing. Wrist-frame models now
execute correctly on the robot AND project to the right pixels.
Cam-frame models (revert is None) skip the revert as a no-op, so
QWEN-HPT and old PI cartesian checkpoints continue to work unchanged.

Add HPT (QWEN) annotation support. _apply_annotation_to_algo now
sets annotation_sampling_mode='first' in addition to PI's
sampling_mode='first', so --annotation-path works for both algos
without subclass-specific code paths.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

rl2aloha commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

rl2aloha and others added 4 commits June 2, 2026 02:46
Add 'm' intervention command (independent of 'v') that saves the
prediction viz at the model's input resolution — 224x224 with padding
mirroring resize_with_pad_torch — written to debug/viz_model_<step>.png.
Intrinsics are scaled-and-padded to match (base intrinsics calibrated
for 640x480 are first scaled to the live camera's resolution, then by
the resize ratio, with padding added to cx/cy) so projections still
land on the correct pixels. Pred is hidden (pred_alpha=0.0) so the
saved image shows what the model actually sees, with only the green
GT cluster overlaid.

Fix the 'annotations' KeyError that crashed both viz modes when no
--annotation-path was supplied. The viz_func partial bakes in
annotation_key from the config, so viz_gt_preds does an unconditional
batch[annotation_key] lookup. _build_viz_func_from_config now captures
that key, and both _save_viz / _save_viz_model_res inject a default
(self.annotation or empty string) into the batch before calling viz_func.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cherry-picked file-scoped from commit 2c53ef8 ("6D pi data transform"):
- egomimic/utils/pose_utils.py: adds _ypr_to_rot6d / _rot6d_to_ypr helpers
- egomimic/rldb/zarr/action_chunk_transforms.py: adds CartesianYPRToRot6D and
  CartesianRot6DToYPR transform classes (xyz+ypr <-> xyz+rot6d per arm, 14<->20
  or 12<->18 dims; numpy/tensor passthrough)
- egomimic/rldb/embodiment/eva.py: adds 'cartesian_6d' and
  'cartesian_wristframe_6d' modes to Eva.get_transform_list, plus the two
  cam-frame revert builders the evaluator config references
  (_build_eva_cartesian_revert_6d_transform_list,
  _build_eva_cartesian_revert_6d_wristframe_transform_list).

Unblocks rollout of pi05_eva_aria_wristframe_6d checkpoints, which the rollout
branch was crashing on with 'NoneType is not iterable' (unknown mode falling
through Eva.get_transform_list) and 'Error locating target ...
_build_eva_cartesian_revert_6d_wristframe_transform_list' (missing revert
builder). Skipped the training-side changes in 2c53ef8 (pi.py, hydra configs,
action_utils, human.py, test files) — not needed for rollout and would
conflict with rollout-branch edits to pi.py.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…on format

Followup to 491496f (which only brought the data-side eva.py / transforms /
pose_utils). The 6D checkpoint outputs 20-dim per-arm actions (xyz + rot6d +
gripper) but RobotBimanualCartesianEuler.to32 still expected 14-dim, causing:
  ValueError: RobotBimanual: expected 14-dim, got 20

File-checkout from 2c53ef8 brings in:
- The full raw-rotation refactor that landed on aidan/pi-6d before 2c53ef8
  (1ddf896, 53e69ea, 59729c4) — PI05_CARTESIAN_ACTION_ENCODING_{RAW_ROT_6D,
  LEGACY, NORM_ROT_6D} constants, to32_raw_rotation/from32_raw_rotation, and
  raw/normalized rotation pipeline in PI.forward_eval / PI._unnormalize_action.
- The new to32_norm_6d / from32_norm_6d converter methods on
  RobotBimanualCartesianEuler and HumanBimanualCartesianEuler.
- The NORM_ROT_6D dispatch branches in PI.forward_eval (extract 6D, then
  unnormalize) and PI's action-encoding setup.

Safe checkout: HEAD's pi.py and action_utils.py were byte-identical to the
merge-base with 2c53ef8, so file-checkout doesn't overwrite any rollout-branch
edits to these files. Purely EgoVerse-side; no openpi changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Got rolled back by an earlier gt sync. The 20GB pi05 checkpoint OOM-killed
the rollout between the patch-save and the main ModelWrapper.load_from_checkpoint:
both held the full checkpoint simultaneously (~40GB peak on a host with less).

Two mitigations:
- If <ckpt>.patched already exists on disk, return its path immediately
  and skip both torch.load and torch.save. The first run still pays the
  double-load cost; every subsequent run is single-load.
- del + gc.collect() between _patch_checkpoint_paths and the main load so
  the patching checkpoint is freed before the load load runs even on
  first-launch.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Claude Code Review

Review of PR #482

Summary

Adds config-driven rollout behavior (transform_list mode, viz_func, revert transform) plus three pi0.5 action encoding modes (legacy, raw-rot6d, normalized-rot6d) with supporting YPR↔6D converters. The scope is broader than the title suggests — there's a substantial new training-side action encoding pathway bundled in.

Key concerns

  1. Scope creep — title vs. diff. The PR title is "rollout: config-driven viz, wristframe revert, HPT annotation support" but ~60% of the diff is a new pi0.5 action encoding system (PI05_CARTESIAN_ACTION_ENCODING_*, to32_raw_rotation, to32_norm_6d, CartesianYPRToRot6D, new cartesian_6d modes in Eva). This is a training-path change that affects norm stats and checkpoint compatibility, and it deserves its own PR with its own review. Please split.

  2. Default action_encoding is LEGACY — but does every existing trained checkpoint set this field? The new action_encoding kwarg defaults to PI05_CARTESIAN_ACTION_ENCODING_LEGACY, which preserves old behavior at construction time. However, Hydra-instantiated PI configs that don't include action_encoding will silently get LEGACY. Confirm no in-flight configs were written assuming the new path is the default.

  3. Checkpoint patching memory optimization changes return contract silently. _patch_checkpoint_paths now returns (patched_path, None) when reusing an existing .patched file or after patching. Previously it returned (path, cfg) on the no-patch path. The only caller (_load_policy) discards cfg, so this is fine today, but the docstring/return type isn't updated. Also: the cached .patched file is reused without validating it matches the current ckpt_path — if someone re-trains and overwrites the original ckpt but the .patched sidecar is stale, you'll silently load old weights. Add an mtime check or hash.

  4. YAML config parsing is brittle. _build_transform_list_from_config, _build_viz_func_from_config, and _build_revert_transform_from_config each independently re-open .hydra/config.yaml, walk into data.train_datasets.eva_*, and pattern-match on _target_ substrings ("Eva.get_transform_list" in target). This will break the day someone renames or refactors the Eva builder. Consolidate into one config-load helper and prefer structural checks over substring matching on _target_.

  5. _apply_revert_to_actions runs the revert on GT and preds via Embodiment.apply_transform — confirm that this transform pipeline doesn't mutate observations.state.ee_pose in place. The wrist-frame revert reads ee_pose as the reference; if the same dict is passed twice (once for preds, once for GT) and ee_pose gets normalized/transformed on the first pass, the second pass uses corrupted data. Worth a unit test.

  6. INTRINSICS["base"] monkey-patch in _save_viz_model_res is not exception-safe across threads. It's wrapped in try/finally (good), but if viz_func is ever called concurrently (e.g. async logging), the global swap races. Document that viz_func is single-threaded or pass intrinsics through.

  7. No tests. This adds ~500 lines of action encoding logic with hand-rolled normalization (_apply_norm_one, _apply_unnorm_one) duplicating logic that presumably lives in norm_stats.unnormalize. There must be a round-trip test: from32(to32(x)) == x and from32_raw_rotation(to32_raw_rotation(x, stats), stats, unnormalize_non_rotation=True) == x for both Robot and Human converters across all three encodings. Same for _ypr_to_rot6d_rot6d_to_ypr.

  8. Hardcoded ARIA 640×480 intrinsics assumption. _save_viz resizes to (480, 640) unconditionally because INTRINSICS["base"] is calibrated for Aria at that resolution. If anyone runs this rollout with a non-Aria embodiment or different intrinsics, the viz silently lies. Gate this on embodiment or read the target resolution from INTRINSICS["base"] (you already compute cx*2, cy*2 for that purpose in _save_viz_model_res).

Suggestions

  • Split this PR. Rollout changes → one PR. Action encoding + 6D transforms + Eva mode additions → separate PR with norm-stat regression tests.
  • Add unit tests for the YPR↔6D round trip and all four to32*/from32* round trips per converter.
  • Add an mtime/hash check when reusing the cached .patched checkpoint.
  • Replace substring matching on _target_ with a documented schema or a single shared parser.
  • Confirm Embodiment.apply_transform is non-mutating on ee_pose, or deep-copy before the second call.
  • Update the _patch_checkpoint_paths docstring to reflect the new (path, None) return.
  • Consider extracting the three _build_*_from_config methods' YAML-loading boilerplate into one helper.
  • The viz debug print block (_viz_debug_printed) is ~30 lines of inline f-strings — fine for a research codebase but consider moving behind a --debug flag.

Verdict: Request Changes

Primarily because of the scope (please split) and the absence of tests for the new action encoding code, which touches training. The rollout-only pieces look solid and I'd approve them on their own.


Reviewed by Claude · Review workflow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant