feat: add SciCode environment#1487
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit f24882d. Configure here.
| ast.parse(code) | ||
| return 1.0 | ||
| except SyntaxError: | ||
| return 0.0 |
There was a problem hiding this comment.
Reward functions receive Messages list, not string
High Severity
All reward functions declare completion: str and directly operate on it as a string (calling extract_python_code(completion), re.search(..., completion), etc.), but the Verifiers framework passes state["completion"] which is a Messages list (e.g. [{"role": "assistant", "content": "..."}]). Other environments correctly use parser.parse_answer(completion) to extract text first. Since extract_python_code checks "```" not in completion (always True for a list of dicts) then calls completion.strip(), every call raises an AttributeError caught by the rubric's exception handler, causing all rewards to silently return 0.0 always.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit f24882d. Configure here.
| _ = answer, kwargs | ||
| code = strip_imports(extract_python_code(completion)) | ||
| banned = [r"\bassert\b", r"if\s+__name__\s*==", r"print\s*\(", r"pytest", r"unittest"] | ||
| return 0.0 if any(re.search(pattern, code) for pattern in banned) else 1.0 |
There was a problem hiding this comment.
Missing word boundary causes false matches on "print"
Low Severity
The banned pattern r"print\s*\(" lacks a word boundary (\b), so it matches substrings within identifiers like fingerprint(, blueprint(, or sprint(. In scientific computing code (chemistry, bioinformatics), fingerprint is a plausible function name. This causes valid code to be incorrectly penalized with a 0.0 reward.
Reviewed by Cursor Bugbot for commit f24882d. Configure here.


Summary
scicodeenvironment for the Algora SciCode bounty.SciCode1/SciCodefrom Hugging Face and convert each scientific research substep into a SingleTurnEnv coding prompt.Verification
uv run --no-dev ruff check environments/scicodeuv pip install -e environments/scicodeuv run --no-dev python environments/scicode/scicode.pyuv run --no-dev python - <<'PY' ... vf.load_environment('scicode') + reward smoke checks ... PYCHANGED_ENVS=scicode uv run --no-dev pytest tests/test_envs.py -q --tb=shortwas attempted, but the Windows host cannot execute the test's hard-coded/bin/bashsubprocess path.Algora bounty: https://algora.io/PrimeIntellect-ai/bounties/AG9a7bN3dkaFcVL3
Reference implementation: https://github.com/scicode-bench/SciCode
Dataset: https://huggingface.co/datasets/SciCode1/SciCode
Note
Low Risk
Additive example environment and docs; no changes to core auth, training, or shared runtime paths.
Overview
Adds a new
scicodeinstallable environment for SciCode-style scientific coding substeps, and lists it in the environments README under SingleTurnEnv examples.The environment loads
SciCode1/SciCode(with a small local fallback when HF is unavailable), expands each problem into per-stepquestion/answerrows with SciCode-style prompts (prior steps, dependencies, exact header), and exposesload_environmentas aSingleTurnEnvwith a weighted rubric of static checks only—syntax, expected def/class name, fenced Python, no tests/prints/asserts,# Background:comment, and a return—without HDF5 numeric verification.Reviewed by Cursor Bugbot for commit f24882d. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add SciCode environment for scientific research coding substep evaluation
scicodeenvironment that loads the SciCode1/SciCode dataset and presents individual substeps as single-turn prompts.Macroscope summarized f24882d.