add robotwin sim for train and eval#503
Draft
RyanPCo wants to merge 9 commits into
Draft
Conversation
Contributor
Author
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
e0d067e to
950a5f8
Compare
98c08e7 to
41a1e0a
Compare
- train_robotwin_ricl: CKPT_DIR -> repo-relative pi05_base_pytorch (the old /storage/project/r-dxu345-0 default was from a different cluster and does not exist here) - drop accidental external/robocasa + external/robosuite gitlinks (no .gitmodules entries, empty; RoboTwin is the only sim we need) - CLAUDE.md: Environment section MacBook -> Georgia Tech SLURM cluster Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- build_robotwin_bank_index.py: consolidated DINOv2 kNN index over a RoboTwinCorpus (HDF5 bank), built with the same .embed as eval's OnlineRetriever; output format matches build_embedding_index.build_retrieval_index. + CPU unit test. - train_robotwin_ricl.stage_full: save the train corpus's quantiles.json beside the checkpoints (eval normalizes identically via robotwin_policy). - robotwin_policy: wire PIRicl tokenizer to the vendored pg_tokenizer (was None) so forward_eval can tokenize the spliced prompt; overridable via usr_args["tokenizer"]. - train_robotwin_ricl.sbatch: this-cluster launcher (hoffman-lab a40); the script uses a Lightning Trainer directly, not submitit. - docs: refresh ricl/CLAUDE.md + robotwin_setup.md (cluster runbook, eval artifacts, selective sim install to avoid the torch==2.4.1 downgrade). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Full fp32-AdamW fine-tune of the 3.6B pi0.5 model OOMs on this cluster's largest GPU (a40 44GB / l40s 48GB; no A100/H100/H200) — optimizer state alone is ~29GB. - train_robotwin_ricl.py: add --adam8bit (bitsandbytes AdamW8bit) — optimizer state ~29GB -> ~7GB, so the run fits (~22GB + activations). Default off (H200 keeps AdamW). - train_robotwin_ricl.sbatch: --adam8bit, batch 2, PYTORCH_CUDA_ALLOC_CONF= expandable_segments:True, a40 (l40s is GRES-restricted for interactive jobs). - CLAUDE.md: venv is uv-managed (no pip -> `uv pip install`); GPU mem ceiling + 8-bit-AdamW workaround + overcap partition for extra/idle GPUs. Verified: job steps with bf16-mixed AMP, action_loss ~1.5 and falling, no OOM. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Surfaced by the first real RoboTwin closed-loop runs: - robotwin_policy._load_algo: PIRicl is not an nn.Module — set eval mode via nets.eval(), move nets to CUDA, and set algo.device (Lightning normally does this). - get_action: wrap forward in bf16 autocast (model is bf16; training used bf16-mixed) and pass a dummy zero action so process_batch_for_training can infer (B, horizon) at inference (sample_actions ignores its values). - eval_robotwin_ricl.sbatch: venv-aware launcher on a known-good node; TORCH_COMPILE_ DISABLE=1 (eval-only, avoids the multi-min max-autotune) + EVAL_TEST_NUM bound. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Eval ran all episodes in SAPIEN (exit 0); the 250-step checkpoint scored 1/10 on beat_block_hammer (undertrained — pipeline is the deliverable). - eval_robotwin_ricl.sbatch: python -u (live progress; output was block-buffered). - robotwin_setup.md: mark both TODOs done; document the selective sim install, the external/RoboTwin fork patches that let policy eval run without curobo (stub planner, in-process planner, skip expert_check, task-name instruction, no eval video), and the bad-node/CUDA-driver + uv-pip gotchas. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
robotwin_new_cluster_setup.md — reproducible end-to-end install (repo+submodules, uv venv, tokenizer, base ckpt, data, 8-bit-AdamW training, selective sim install + RoboTwin-fork patches, closed-loop eval) plus a gotchas/troubleshooting section and a verification ladder. Cross-linked from robotwin_setup.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- setup_multitask_robotwin.py: download aloha-agilex slices for disjoint train/eval
task sets and symlink N demos/task into train_root/eval_root (<task>/{data,
instructions}); hard-fails if the sets overlap (eval tasks must be unseen in train).
- robotwin_policy: no_incontext mode — skip the OnlineRetriever and pass no
ricl_retrieved_* keys so PIRicl runs as base pi0.5 (plain-finetune eval baseline).
- train sbatch: ${NO_INCONTEXT:+--no-incontext} toggle.
Enables the canonical RICL test: train on N tasks, eval retrieval-vs-floor on held-out
NEW tasks; plain trained on the SAME data is the baseline.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- robotwin_policy: parse no_incontext from a string override (eval_policy --overrides passes strings), not just a yaml bool. - eval_multitask_compare.sbatch: for each held-out task, build its DINOv2 bank index, then closed-loop eval the RICL model (within-task retrieval ON) and the plain model (retrieval OFF). Pass per-run paths via eval_policy --overrides. Queue with --dependency=afterany on the two training jobs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
b502f24 to
bb62e15
Compare
41a1e0a to
49687e5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

No description provided.