Skip to content

Fix Python remote eval parameter serialization in dev mode#210

Open
ekeith (evanmkeith) wants to merge 1 commit into
mainfrom
05-27-fix-python-remote-eval-parameter-listing
Open

Fix Python remote eval parameter serialization in dev mode#210
ekeith (evanmkeith) wants to merge 1 commit into
mainfrom
05-27-fix-python-remote-eval-parameter-listing

Conversation

@evanmkeith
Copy link
Copy Markdown

Summary

Fixes Python bt eval --dev /list responses for evals that define parameters={...}.

The Python eval runner was emitting parameters as a raw JSON Schema object via parameters_to_json_schema(...). The Braintrust UI expects remote eval parameters to use the serialized parameter container shape, such as braintrust.staticParameters or braintrust.parameters. As a result, /list could return 200 OK while the UI still failed to parse the evaluator manifest and showed the generic connection/listing error.

This updates the Python runner to serialize parameters with the same remote eval parameter container shape used by the Python SDK devserver/push flow.

Test Plan

  • Add a Python list-mode fixture with a Pydantic parameter model.
  • Assert BT_EVAL_DEV_MODE=list emits parameters.type = "braintrust.staticParameters".
  • Run:
    • python3 -m py_compile scripts/eval-runner.py tests/evals/py/remote_list_params/eval_remote_list_params.py
    • cargo fmt --check
    • cargo test eval_python_runner_list_mode_serializes_remote_parameter_container --test eval_fixtures -- --nocapture

## Summary

  Fixes Python `bt eval --dev` `/list` responses for evals that define `parameters={...}`.

  The Python eval runner was emitting parameters as a raw JSON Schema object via `parameters_to_json_schema(...)`. The Braintrust UI expects remote eval parameters to use the
  serialized parameter container shape, such as `braintrust.staticParameters` or `braintrust.parameters`. As a result, `/list` could return `200 OK` while the UI still failed to
  parse the evaluator manifest and showed the generic connection/listing error.

  This updates the Python runner to serialize parameters with the same remote eval parameter container shape used by the Python SDK devserver/push flow.

  ## Test Plan

  - Add a Python list-mode fixture with a Pydantic parameter model.
  - Assert `BT_EVAL_DEV_MODE=list` emits `parameters.type = "braintrust.staticParameters"`.
  - Run:
    - `python3 -m py_compile scripts/eval-runner.py tests/evals/py/remote_list_params/eval_remote_list_params.py`
    - `cargo fmt --check`
    - `cargo test eval_python_runner_list_mode_serializes_remote_parameter_container --test eval_fixtures -- --nocapture`
@github-actions
Copy link
Copy Markdown

Latest downloadable build artifacts for this PR commit 1f4618877fd0:

Available artifact names
  • ``artifacts-build-global
  • ``artifacts-build-local-aarch64-pc-windows-msvc
  • ``artifacts-build-local-x86_64-apple-darwin
  • ``artifacts-build-local-x86_64-pc-windows-msvc
  • ``artifacts-build-local-x86_64-unknown-linux-musl
  • ``artifacts-build-local-aarch64-apple-darwin
  • ``artifacts-build-local-x86_64-unknown-linux-gnu
  • ``artifacts-build-local-aarch64-unknown-linux-gnu
  • ``artifacts-plan-dist-manifest
  • ``cargo-dist-cache

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant