feat: add LisanBench environment#1486
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 700bf83. Configure here.
| return 0.0 | ||
| has_explanation_markers = bool(re.search(r"\b(reason|because|explanation|steps?)\b", completion, re.I)) | ||
| comma_like = "," in completion or "->" in completion or "→" in completion | ||
| return 1.0 if comma_like and not has_explanation_markers else 0.5 if comma_like else 0.0 |
There was a problem hiding this comment.
Reward functions receive message list, not string
High Severity
All five reward functions annotate completion as str and pass it directly to extract_word_chain(), which calls completion.lower(). However, the Verifiers framework passes completion as state["completion"] — a list[dict[str, str]] of message dicts, not a string. Every other environment in the repo (e.g. reverse_text, wordle, mmmu) correctly uses parser.parse_answer(completion) to extract text first. Calling .lower() on a list raises AttributeError, which the rubric silently catches, returning 0.0 for every reward function. The format_reward function also directly applies re.search() and "," in completion to the list, compounding the issue. The environment appears to run but all rewards are always 0.0, making it completely non-functional for evaluation and training.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 700bf83. Configure here.


Summary
lisanbenchenvironment for the Algora LisanBench bounty.dwyl/english-wordsdictionary on first use with a small offline fallback for smoke tests.Verification
uv run --no-dev ruff check environments/lisanbenchuv pip install -e environments/lisanbenchuv run --no-dev python environments/lisanbench/lisanbench.pyuv run --no-dev python - <<'PY' ... vf.load_environment('lisanbench') + reward smoke checks ... PYCHANGED_ENVS=lisanbench uv run --no-dev pytest tests/test_envs.py -q --tb=shortwas attempted, but the Windows host cannot execute the test's hard-coded/bin/bashsubprocess path.Algora bounty: https://algora.io/PrimeIntellect-ai/bounties/dDffD24XfkQUaR7a
Reference implementation: https://github.com/voice-from-the-outer-world/lisan-bench
Note
Low Risk
Self-contained new environment under
environments/lisanbenchwith no changes to core auth, data pipelines, or shared runtime beyond documentation.Overview
Adds a new installable
lisanbenchsingle-turn environment and documents it in the environments index.The model must extend a given starting word into a comma-separated English word chain where each step has Levenshtein distance 1, words are dictionary-valid, and repeats are forbidden. Scoring uses a weighted rubric (start word, transition validity, valid-prefix length capped at 25 words, no duplicates, list-only formatting). The English lexicon is loaded from
dwyl/english-wordsinto~/.cache/verifiers/lisanbench/on first use, with a small embedded fallback when download or read fails. The package includesload_environment, default starting-word tasks, and eval defaults inpyproject.toml.Reviewed by Cursor Bugbot for commit 700bf83. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add LisanBench single-turn word-chain environment
lisanbenchenvironment in lisanbench.py implementing a word-chain task where a model must produce a comma-separated chain of English words each differing by edit distance 1 from the previous.words_alpha.txt(stored at~/.cache/verifiers/lisanbench/words_alpha.txt), falling back to a small in-module word set on download failure.load_environment()function returning avf.SingleTurnEnvand documents the environment in environments/README.md.Macroscope summarized 700bf83.