Skip to content

add robotwin sim for train and eval#503

Draft
RyanPCo wants to merge 9 commits into
ryanco/in-context-learningfrom
ryanco/robotwin-sim
Draft

add robotwin sim for train and eval#503
RyanPCo wants to merge 9 commits into
ryanco/in-context-learningfrom
ryanco/robotwin-sim

Conversation

@RyanPCo

@RyanPCo RyanPCo commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

No description provided.

RyanPCo commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

@RyanPCo RyanPCo force-pushed the ryanco/robotwin-sim branch 3 times, most recently from e0d067e to 950a5f8 Compare June 16, 2026 20:44
@RyanPCo RyanPCo changed the title add robocasa sim for train and eval add robotwin sim for train and eval Jun 17, 2026
@RyanPCo RyanPCo force-pushed the ryanco/robotwin-sim branch from 98c08e7 to 41a1e0a Compare June 18, 2026 16:16
RyanPCo and others added 9 commits June 18, 2026 12:58
- train_robotwin_ricl: CKPT_DIR -> repo-relative pi05_base_pytorch
  (the old /storage/project/r-dxu345-0 default was from a different cluster
  and does not exist here)
- drop accidental external/robocasa + external/robosuite gitlinks (no
  .gitmodules entries, empty; RoboTwin is the only sim we need)
- CLAUDE.md: Environment section MacBook -> Georgia Tech SLURM cluster

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- build_robotwin_bank_index.py: consolidated DINOv2 kNN index over a RoboTwinCorpus
  (HDF5 bank), built with the same .embed as eval's OnlineRetriever; output format
  matches build_embedding_index.build_retrieval_index. + CPU unit test.
- train_robotwin_ricl.stage_full: save the train corpus's quantiles.json beside the
  checkpoints (eval normalizes identically via robotwin_policy).
- robotwin_policy: wire PIRicl tokenizer to the vendored pg_tokenizer (was None) so
  forward_eval can tokenize the spliced prompt; overridable via usr_args["tokenizer"].
- train_robotwin_ricl.sbatch: this-cluster launcher (hoffman-lab a40); the script
  uses a Lightning Trainer directly, not submitit.
- docs: refresh ricl/CLAUDE.md + robotwin_setup.md (cluster runbook, eval artifacts,
  selective sim install to avoid the torch==2.4.1 downgrade).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Full fp32-AdamW fine-tune of the 3.6B pi0.5 model OOMs on this cluster's largest
GPU (a40 44GB / l40s 48GB; no A100/H100/H200) — optimizer state alone is ~29GB.

- train_robotwin_ricl.py: add --adam8bit (bitsandbytes AdamW8bit) — optimizer state
  ~29GB -> ~7GB, so the run fits (~22GB + activations). Default off (H200 keeps AdamW).
- train_robotwin_ricl.sbatch: --adam8bit, batch 2, PYTORCH_CUDA_ALLOC_CONF=
  expandable_segments:True, a40 (l40s is GRES-restricted for interactive jobs).
- CLAUDE.md: venv is uv-managed (no pip -> `uv pip install`); GPU mem ceiling +
  8-bit-AdamW workaround + overcap partition for extra/idle GPUs.

Verified: job steps with bf16-mixed AMP, action_loss ~1.5 and falling, no OOM.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Surfaced by the first real RoboTwin closed-loop runs:
- robotwin_policy._load_algo: PIRicl is not an nn.Module — set eval mode via
  nets.eval(), move nets to CUDA, and set algo.device (Lightning normally does this).
- get_action: wrap forward in bf16 autocast (model is bf16; training used bf16-mixed)
  and pass a dummy zero action so process_batch_for_training can infer (B, horizon)
  at inference (sample_actions ignores its values).
- eval_robotwin_ricl.sbatch: venv-aware launcher on a known-good node; TORCH_COMPILE_
  DISABLE=1 (eval-only, avoids the multi-min max-autotune) + EVAL_TEST_NUM bound.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Eval ran all episodes in SAPIEN (exit 0); the 250-step checkpoint scored 1/10 on
beat_block_hammer (undertrained — pipeline is the deliverable).

- eval_robotwin_ricl.sbatch: python -u (live progress; output was block-buffered).
- robotwin_setup.md: mark both TODOs done; document the selective sim install, the
  external/RoboTwin fork patches that let policy eval run without curobo (stub planner,
  in-process planner, skip expert_check, task-name instruction, no eval video), and the
  bad-node/CUDA-driver + uv-pip gotchas.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
robotwin_new_cluster_setup.md — reproducible end-to-end install (repo+submodules, uv
venv, tokenizer, base ckpt, data, 8-bit-AdamW training, selective sim install +
RoboTwin-fork patches, closed-loop eval) plus a gotchas/troubleshooting section and a
verification ladder. Cross-linked from robotwin_setup.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- setup_multitask_robotwin.py: download aloha-agilex slices for disjoint train/eval
  task sets and symlink N demos/task into train_root/eval_root (<task>/{data,
  instructions}); hard-fails if the sets overlap (eval tasks must be unseen in train).
- robotwin_policy: no_incontext mode — skip the OnlineRetriever and pass no
  ricl_retrieved_* keys so PIRicl runs as base pi0.5 (plain-finetune eval baseline).
- train sbatch: ${NO_INCONTEXT:+--no-incontext} toggle.

Enables the canonical RICL test: train on N tasks, eval retrieval-vs-floor on held-out
NEW tasks; plain trained on the SAME data is the baseline.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- robotwin_policy: parse no_incontext from a string override (eval_policy --overrides
  passes strings), not just a yaml bool.
- eval_multitask_compare.sbatch: for each held-out task, build its DINOv2 bank index,
  then closed-loop eval the RICL model (within-task retrieval ON) and the plain model
  (retrieval OFF). Pass per-run paths via eval_policy --overrides. Queue with
  --dependency=afterany on the two training jobs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@RyanPCo RyanPCo force-pushed the ryanco/in-context-learning branch from b502f24 to bb62e15 Compare June 18, 2026 19:58
@RyanPCo RyanPCo force-pushed the ryanco/robotwin-sim branch from 41a1e0a to 49687e5 Compare June 18, 2026 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant