Skip to content

feat: add Rust Cargo feedback environment#1485

Open
Alfianfc wants to merge 1 commit into
PrimeIntellect-ai:mainfrom
Alfianfc:feat/rust-cargo-env
Open

feat: add Rust Cargo feedback environment#1485
Alfianfc wants to merge 1 commit into
PrimeIntellect-ai:mainfrom
Alfianfc:feat/rust-cargo-env

Conversation

@Alfianfc
Copy link
Copy Markdown

@Alfianfc Alfianfc commented May 29, 2026

Summary

  • Add a packaged rust-cargo environment for the Algora Rust w/ Cargo feedback bounty.
  • Provide Rust coding prompts that ask for one fenced Rust solution with implementation and tests.
  • Score completions with static rewards for format/function/assertions plus executable rewards for cargo build, cargo clippy -- -D warnings, and cargo test.
  • Document quickstart and Cargo toolchain requirement.

Verification

  • uv run --no-dev ruff check environments/rust_cargo
  • uv pip install -e environments/rust_cargo
  • uv run --no-dev python environments/rust_cargo/rust_cargo.py
  • uv run --no-dev python - <<'PY' ... vf.load_environment('rust-cargo') ... PY
  • CHANGED_ENVS=rust_cargo uv run --no-dev pytest tests/test_envs.py -q --tb=short was attempted, but the Windows host cannot execute the test's hard-coded /bin/bash subprocess path.

Note: this Windows host does not have cargo installed, so local smoke verification covered environment loading and static reward helpers. The executable reward functions run Cargo when it is available on PATH.

Algora bounty: https://algora.io/PrimeIntellect-ai/bounties/FDaU5Y9iKeE8qfk3
Reference: https://github.com/Oxen-AI/GRPO-With-Cargo-Feedback/blob/main/train.py


Note

Medium Risk
Eval runners execute model-generated Rust via subprocess in ephemeral directories (timeouts, no cargo → zero reward); hosts need a trusted toolchain and should treat this like other code-exec envs, not a security boundary.

Overview
Adds a new installable rust-cargo single-turn environment for Rust code generation, aligned with the Algora Cargo-feedback bounty pattern.

The model gets Rust prompts and must answer with one fenced rust block (implementation, #[cfg(test)] module, assertions, no main). A built-in dataset covers five small tasks. Scoring uses a vf.Rubric mixing static checks (single block, required fn, assertion count) with executable feedback: extracted code is written to a temp crate and run through cargo build, cargo clippy -D warnings, and cargo test (tests weighted highest). load_environment() wires this into vf.SingleTurnEnv with a TDD-oriented system prompt. Package metadata, quickstart, and the root environments/README.md catalog entry document the env and the cargo on PATH requirement.

Reviewed by Cursor Bugbot for commit c155d4c. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add Rust Cargo feedback environment for code-generation scoring

  • Adds a new rust-cargo single-turn environment in rust_cargo.py that scores model-generated Rust code using a weighted rubric of static checks and live Cargo tool execution.
  • Static rewards check for a single rust code block, presence of the required function, no main function, a #[cfg(test)] module, and assertion count (up to 4).
  • Cargo-based rewards run cargo build, cargo clippy (zero warnings), and cargo test in a temporary workspace via run_cargo_tool.
  • Includes five built-in Rust tasks and a dataset builder, with a README and pyproject.toml for packaging.
  • Risk: reward functions that invoke Cargo will fail gracefully if Cargo is not installed, returning 0.0 with an error message rather than raising.

Macroscope summarized c155d4c.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c155d4c. Configure here.

score += 0.20
if count_assertions(code) >= 2:
score += 0.10
return score
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completion is a message list, not a string

High Severity

All six reward functions treat completion as a str and pass it directly to extract_rust_code() / has_single_rust_block(), which call re.search() on it. However, the framework provides completion as a list[dict] (message objects). This causes a TypeError on every call, silently caught by the Rubric error handler, making every reward always return 0.0. The environment produces no useful training signal. The text needs to be extracted first, e.g. via parser.parse_answer(completion) or completion[-1]["content"].

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c155d4c. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant