renderers <-> v1: serializable RenderedTokens by snimu · Pull Request #1471 · PrimeIntellect-ai/verifiers

snimu · 2026-05-26T19:27:07Z

Description

RenderedTokens is a plain @DataClass (defined at renderers/base.py:159), which json.dumps cannot serialize.
PR feat(renderer-client): give per-token prompt attribution to TrajectoryStep #1414 puts it in state. In verifiers/utils/response_utils.py:131-133, parse_response_tokens writes the raw RenderedTokens dataclass onto the step:

if prompt_attribution is not None:
    out["prompt_attribution"] = prompt_attribution
    tokens.prompt_attribution = None

out is the TrajectoryStepTokens dict that MultiTurnEnv.add_model_response then appends to state["trajectory"] (verifiers/envs/multiturn_env.py:151-162).
The v1 path then asserts JSON-serializability of the entire state:
- verifiers/v1/harness.py:184: state.assert_serializable() after rollout completion
- verifiers/v1/env.py:123: state.assert_serializable() after group run
- Both eventually call json.dumps(self) on the whole state (verifiers/types.py:898-902), traversing state["trajectory"][i]["tokens"]["prompt_attribution"].

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Test improvement

Testing

All existing tests pass when running uv run pytest locally.
New tests have been added to cover the changes

Checklist

My code follows the style guidelines of this project as outlined in AGENTS.md
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published

Note

Low Risk
Localized serialization change in token parsing with tests; only consumers expecting a RenderedTokens object on trajectory steps need to rehydrate from dict.

Overview
parse_response_tokens now stores renderer prompt_attribution on trajectory step tokens as a JSON-safe dict (dataclasses.asdict), stripping nested multi_modal_data from that dict because tensors already live on out["multi_modal_data"]. The sidecar is still moved off ResponseTokens (not duplicated), matching multi_modal_data policy.

Callers that read tokens["prompt_attribution"] get a dict instead of a RenderedTokens instance; docs note rehydration via RenderedTokens(**d). Tests assert dict equality, dict-shaped truncation fields, and a v1 regression that State.assert_serializable() succeeds when attribution is on a trajectory step.

^{Reviewed by Cursor Bugbot for commit b2b4cbf. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Serialize `prompt_attribution` as a plain dict in `parse_response_tokens`

parse_response_tokens in response_utils.py now converts a RenderedTokens dataclass to a plain dict via dataclasses.asdict, clearing multi_modal_data before conversion to avoid non-serializable values.
The dict is rehydratable via RenderedTokens(**d). The original tokens.prompt_attribution is still cleared (move semantics preserved).
A new regression test verifies that a trajectory step with prompt_attribution passes State.assert_serializable.
Behavioral Change: callers that previously received a RenderedTokens instance from prompt_attribution will now receive a plain dict.

^{Macroscope summarized b2b4cbf.}

PR #1414 surfaced the renderer's per-token prompt attribution onto every renderer-client rollout's TrajectoryStepTokens as a live RenderedTokens dataclass. v1 then ran State.assert_serializable over the full state, which json.dumps-walks state["trajectory"] and raises TypeError on the dataclass -- every v1 + renderer rollout failed at that gate. Convert via dataclasses.asdict at the parse_response_tokens sidecar handoff so the trajectory carries the dict form already. This also matches prime-rl's _step_sft_mask consumer contract (prompt_attribution: dict | None, accessed via ["message_indices"] / ["is_content"]) -- previously prime-rl would have raised on dict access against the dataclass too. Downstream consumers rehydrate via RenderedTokens(**d). multi_modal_data has the same latent JSON-gate problem but is intentionally left as a dataclass: save_utils._delta_intermediate_mm_data reads its fields via getattr, so converting it here would silently break the per-step delta optimization for multimodal trajectories. Separate fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two regression tests that catch the PR #1414 v1 serialization bug deterministically (the previous tests only checked parse-boundary output, missing the actual v1 failure path): 1. test_parsed_prompt_attribution_survives_v1_assert_serializable Reproduces v1/runtime.py::submit_model_request exactly -- parse_response_tokens -> serializable() -> state["trajectory"] -> State.assert_serializable(). Pre-fix this raised TypeError on the RenderedTokens dataclass; post-fix it passes and the state round-trips through json.dumps cleanly. Verified A/B: reverting the parse-boundary asdict call makes this test fail with the exact pre-fix TypeError. 2. test_parsed_prompt_attribution_survives_orchestrator_msgpack Locks the verifiers -> orchestrator -> trainer wire contract: msgpack-pack the parsed step with the env_worker encoder (msgpack_encoder), unpack, and assert prime-rl's _step_sft_mask access pattern (prompt_attribution["message_indices"] / ["is_content"]) works on the result. Also checks RenderedTokens(**d) rehydrates losslessly for typed consumers. prime-rl's own test_trajectories.py uses _StubAttribution stubs that bypass the parse pipe, so it produced no A/B signal on either branch. These verifiers-side tests do. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

macroscopeapp · 2026-05-26T19:29:40Z

Approvability

Verdict: Needs human review

Unable to check for correctness in b2b4cbf.

^{You can customize Macroscope's approvability policy. Learn more.}

Inline the asdict conversion (2 lines) instead of a helper with a 12-line docstring; tighten the TrajectoryStepTokens comment to one line. Drop the msgpack regression test -- msgpack's encoder already had a dataclass fallback, so it passed on both branches and provided no A/B signal. Slim the v1 regression test docstring and drop the redundant json.loads(json.dumps(...)) assertion (assert_serializable already exercises that path). Restore the "carries" test to its original shape with a single dataclasses.asdict equality check. Net: +37 LOC instead of +181, same coverage. The v1 regression test still catches the original TypeError when the asdict line is reverted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 7eba25d. Configure here.}

dataclasses.asdict(RenderedTokens) recurses into the multi_modal_data field, whose mm_items[modality][i] holds raw HF processor tensors (pixel_values, image_grid_thw). Those aren't JSON-native, so the asdict conversion alone leaves State.assert_serializable still raising TypeError on multimodal rollouts — same failure mode as before the parse-boundary fix. Zero multi_modal_data via dataclasses.replace before asdict so the tensors never enter the dict (and aren't deep-copied along the way). The canonical mm payload is preserved at out["multi_modal_data"], which has its own non-JSON transport (msgpack with custom encoder on the wire; explicit trajectory allow-list past the JSONL save gate). No data loss. The existing carry test asserts equality against dataclasses.asdict(attribution) on a text-only attribution where multi_modal_data defaults to None — both sides hold a None for that key, so the test still passes unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

After the parse-boundary asdict conversion, TrajectoryStepTokens ["prompt_attribution"] is a plain dict mirroring the RenderedTokens field names — not the dataclass itself. One-word swap in the type comment: "sidecar" → "fields as a dict". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

snimu and others added 2 commits May 26, 2026 20:58

cursor Bot reviewed May 27, 2026

View reviewed changes

Comment thread verifiers/types.py

Comment thread verifiers/utils/response_utils.py Outdated

snimu and others added 2 commits May 27, 2026 11:59

snimu requested review from hallerite, mikasenghaas and willccbb May 27, 2026 10:39

hallerite approved these changes May 27, 2026

View reviewed changes

snimu merged commit e7e2985 into main May 27, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

renderers <-> v1: serializable RenderedTokens#1471

renderers <-> v1: serializable RenderedTokens#1471
snimu merged 5 commits into
mainfrom
sebastian/v1-and-ci-fixes-2026-05-26

snimu commented May 26, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot commented May 26, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

snimu commented May 26, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Testing

Checklist

Serialize prompt_attribution as a plain dict in parse_response_tokens

Uh oh!

macroscopeapp Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

snimu commented May 26, 2026 •

edited by macroscopeapp Bot

Loading

Serialize `prompt_attribution` as a plain dict in `parse_response_tokens`

macroscopeapp Bot commented May 26, 2026 •

edited

Loading