benchmark with repl#2569
benchmark with repl#2569Eugene Yurtsev (eyurtsev) wants to merge 5 commits intolangchain-ai:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR appears to wire the new REPL middleware into the evals suite for benchmarking, while also strengthening the REPL system prompt guidance and extending the evals pytest reporter output with total runtime.
Changes:
- Add stronger REPL language guidance + a full example program to the REPL system prompt (and update prompt tests/snapshots accordingly).
- Add
total_duration_sreporting to the evals pytest reporter and cover it with a new unit test. - Update evals dependencies/lockfiles to include
langchain-repl/langchain-quickjsandpydantic-monty, and switch the relational tool-usage eval to useReplMiddleware.
Reviewed changes
Copilot reviewed 10 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| libs/repl/langchain_repl/middleware.py | Expands REPL system prompt constraints and examples. |
| libs/repl/tests/unit_tests/test_system_prompt.py | Updates assertions for new prompt content. |
| libs/repl/tests/unit_tests/smoke_tests/snapshots/langchain_repl_system_prompt_no_tools.md | Snapshot update for new prompt wording/examples. |
| libs/repl/tests/unit_tests/smoke_tests/snapshots/langchain_repl_system_prompt_mixed_foreign_functions.md | Snapshot update for new prompt wording/examples. |
| libs/repl/pyproject.toml | Adds pydantic-monty dependency. |
| libs/repl/uv.lock | Lockfile update for repl package dependencies. |
| libs/evals/tests/evals/pytest_reporter.py | Adds total_duration_s to the session summary payload + terminal output. |
| libs/evals/tests/unit_tests/test_pytest_reporter.py | Adds coverage ensuring total duration is written to the report and terminal output. |
| libs/evals/tests/evals/test_tool_usage_relational.py | Switches relational eval agent creation to ReplMiddleware (currently conflicts with existing tool-call expectations). |
| libs/evals/pyproject.toml | Adds deepagents, langchain-repl, langchain-quickjs deps and uv sources. |
| libs/evals/uv.lock | Lockfile update for evals package deps, including local editables + quickjs. |
| libs/evals/EVAL_CATALOG.md | Updates catalog line links to match shifted test line numbers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -435,7 +437,8 @@ def _create_agent(model: BaseChatModel): | |||
| """Create agent.""" | |||
| return create_deep_agent( | |||
| model=model, | |||
There was a problem hiding this comment.
create_deep_agent is no longer passed tools=RELATIONAL_TOOLS, but the evals below still assert direct tool calls like list_user_ids, get_user_email, etc. With the current setup, only the repl tool is added by ReplMiddleware (foreign functions are not registered as agent tools), so these expectations will fail. Either keep passing tools=RELATIONAL_TOOLS (and optionally add the middleware) or update the scorer expectations to match repl tool calls + REPL code execution semantics.
| model=model, | |
| model=model, | |
| tools=RELATIONAL_TOOLS, |
| import pytest | ||
| from deepagents import create_deep_agent | ||
| from langchain_core.tools import ToolException, tool | ||
| from langchain_quickjs import QuickJSMiddleware |
There was a problem hiding this comment.
QuickJSMiddleware is imported but only referenced in a commented-out line, so it will be flagged as an unused import (ruff F401) and fail lint. Remove the import or re-enable usage so the symbol is referenced at runtime.
| from langchain_quickjs import QuickJSMiddleware |
| requires-python = ">=3.11,<4.0" | ||
| dependencies = [ | ||
| "deepagents", | ||
| "pydantic-monty>=0.0.9", |
There was a problem hiding this comment.
Repo contribution guidelines prohibit adding dependencies to pyproject.toml without explicit maintainer permission. This PR adds pydantic-monty here; please confirm you have approval or remove the dependency.
| "pydantic-monty>=0.0.9", |
| requires-python = ">=3.11,<4.0" | ||
| dependencies = [ | ||
| "deepagents", | ||
| "pydantic-monty>=0.0.9", |
There was a problem hiding this comment.
pydantic-monty is added as a runtime dependency, but there are no references to it anywhere under libs/repl/ (no imports/usages found). If this dependency is required transitively or for future work, it should be justified; otherwise remove it to avoid carrying an unused dependency.
| "pydantic-monty>=0.0.9", |
| "deepagents", | ||
| "langchain-repl", | ||
| "langchain-quickjs", |
There was a problem hiding this comment.
Repo contribution guidelines prohibit adding dependencies to pyproject.toml without explicit maintainer permission. This PR adds deepagents, langchain-repl, and langchain-quickjs to the project dependencies; please confirm you have approval or remove/revert these dependency changes.
| "deepagents", | |
| "langchain-repl", | |
| "langchain-quickjs", |
quick benchmark with repl