benchmark with repl by eyurtsev · Pull Request #2569 · langchain-ai/deepagents

Eugene Yurtsev (eyurtsev) · 2026-04-08T20:36:32Z

quick benchmark with repl

Copilot

Pull request overview

This PR appears to wire the new REPL middleware into the evals suite for benchmarking, while also strengthening the REPL system prompt guidance and extending the evals pytest reporter output with total runtime.

Changes:

Add stronger REPL language guidance + a full example program to the REPL system prompt (and update prompt tests/snapshots accordingly).
Add total_duration_s reporting to the evals pytest reporter and cover it with a new unit test.
Update evals dependencies/lockfiles to include langchain-repl / langchain-quickjs and pydantic-monty, and switch the relational tool-usage eval to use ReplMiddleware.

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
libs/repl/langchain_repl/middleware.py	Expands REPL system prompt constraints and examples.
libs/repl/tests/unit_tests/test_system_prompt.py	Updates assertions for new prompt content.
libs/repl/tests/unit_tests/smoke_tests/snapshots/langchain_repl_system_prompt_no_tools.md	Snapshot update for new prompt wording/examples.
libs/repl/tests/unit_tests/smoke_tests/snapshots/langchain_repl_system_prompt_mixed_foreign_functions.md	Snapshot update for new prompt wording/examples.
libs/repl/pyproject.toml	Adds `pydantic-monty` dependency.
libs/repl/uv.lock	Lockfile update for repl package dependencies.
libs/evals/tests/evals/pytest_reporter.py	Adds `total_duration_s` to the session summary payload + terminal output.
libs/evals/tests/unit_tests/test_pytest_reporter.py	Adds coverage ensuring total duration is written to the report and terminal output.
libs/evals/tests/evals/test_tool_usage_relational.py	Switches relational eval agent creation to `ReplMiddleware` (currently conflicts with existing tool-call expectations).
libs/evals/pyproject.toml	Adds `deepagents`, `langchain-repl`, `langchain-quickjs` deps and uv sources.
libs/evals/uv.lock	Lockfile update for evals package deps, including local editables + quickjs.
libs/evals/EVAL_CATALOG.md	Updates catalog line links to match shifted test line numbers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T20:43:20Z

@@ -435,7 +437,8 @@ def _create_agent(model: BaseChatModel):
    """Create agent."""
    return create_deep_agent(
        model=model,


create_deep_agent is no longer passed tools=RELATIONAL_TOOLS, but the evals below still assert direct tool calls like list_user_ids, get_user_email, etc. With the current setup, only the repl tool is added by ReplMiddleware (foreign functions are not registered as agent tools), so these expectations will fail. Either keep passing tools=RELATIONAL_TOOLS (and optionally add the middleware) or update the scorer expectations to match repl tool calls + REPL code execution semantics.

Suggested change

model=model,

model=model,

tools=RELATIONAL_TOOLS,

Copilot · 2026-04-08T20:43:21Z

 import pytest
 from deepagents import create_deep_agent
 from langchain_core.tools import ToolException, tool
+from langchain_quickjs import QuickJSMiddleware


QuickJSMiddleware is imported but only referenced in a commented-out line, so it will be flagged as an unused import (ruff F401) and fail lint. Remove the import or re-enable usage so the symbol is referenced at runtime.

Suggested change

from langchain_quickjs import QuickJSMiddleware

Copilot · 2026-04-08T20:43:21Z

 requires-python = ">=3.11,<4.0"
 dependencies = [
    "deepagents",
+    "pydantic-monty>=0.0.9",


Repo contribution guidelines prohibit adding dependencies to pyproject.toml without explicit maintainer permission. This PR adds pydantic-monty here; please confirm you have approval or remove the dependency.

Suggested change

"pydantic-monty>=0.0.9",

Copilot · 2026-04-08T20:43:21Z

 requires-python = ">=3.11,<4.0"
 dependencies = [
    "deepagents",
+    "pydantic-monty>=0.0.9",


pydantic-monty is added as a runtime dependency, but there are no references to it anywhere under libs/repl/ (no imports/usages found). If this dependency is required transitively or for future work, it should be justified; otherwise remove it to avoid carrying an unused dependency.

Suggested change

"pydantic-monty>=0.0.9",

Copilot · 2026-04-08T20:43:22Z

+    "deepagents",
+    "langchain-repl",
+    "langchain-quickjs",


Repo contribution guidelines prohibit adding dependencies to pyproject.toml without explicit maintainer permission. This PR adds deepagents, langchain-repl, and langchain-quickjs to the project dependencies; please confirm you have approval or remove/revert these dependency changes.

Suggested change

"deepagents",

"langchain-repl",

"langchain-quickjs",

Eugene Yurtsev (eyurtsev) added 5 commits April 7, 2026 22:12

x

0b426ad

Merge branch 'main' into eugene/update_evals_repl

b22d7cc

x

3620d10

Merge branch 'main' into eugene/update_evals_repl

4f6d57d

qxqx

17be538

github-actions Bot added dependencies Pull requests that update a dependency file evals Evaluation suite and Harbor integration internal User is a member of the `langchain-ai` GitHub organization repl REPL sandbox package size: S 50-199 LOC labels Apr 8, 2026

Eugene Yurtsev (eyurtsev) marked this pull request as ready for review April 8, 2026 20:38

Eugene Yurtsev (eyurtsev) requested review from Mason Daugherty (mdrxy) and vivek (vtrivedy) as code owners April 8, 2026 20:38

Copilot AI review requested due to automatic review settings April 8, 2026 20:38

Eugene Yurtsev (eyurtsev) requested a review from Maahir Sachdev (maahir30) as a code owner April 8, 2026 20:38

Copilot started reviewing on behalf of Eugene Yurtsev (eyurtsev) April 8, 2026 20:38 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark with repl#2569

benchmark with repl#2569
Eugene Yurtsev (eyurtsev) wants to merge 5 commits intolangchain-ai:mainfrom
eyurtsev:eugene/update_evals_repl

Eugene Yurtsev (eyurtsev) commented Apr 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Eugene Yurtsev (eyurtsev) commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Eugene Yurtsev (eyurtsev) commented Apr 8, 2026 •

edited

Loading