feat(agent-core): compress oversized images before sending to the model#1243
feat(agent-core): compress oversized images before sending to the model#1243RealKai42 wants to merge 3 commits into
Conversation
Downsample images to a 2000px longest-edge and per-image byte budget at the single prompt-ingestion chokepoint (the prompt/steer RPC) and on tool results (ReadMediaFile, MCP), so every client transport — CLI, web, desktop, ACP, SDK — is covered uniformly inside the core. PNG screenshots stay lossless and only degrade to JPEG when the byte budget cannot otherwise be met. Best-effort: the original image is sent unchanged if compression fails.
🦋 Changeset detectedLatest commit: bc27ce5 The changes in this PR will be included in the next version bump. This PR includes changesets to release 3 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
❌ Nix build failed Hash mismatch in
Please update |
commit: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fe827a5978
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR adds centralized, best-effort image downsampling/re-encoding in packages/agent-core so oversized images are compressed before they’re recorded into history and before they reach the model (prompt ingestion RPC, ReadMediaFile, and MCP tool results). The goal is to reduce vision-token cost and avoid provider image-size limits while keeping already-small images on a fast path.
Changes:
- Introduces a new shared image compressor (
compressImageForModel/compressImageContentParts) based on lazy-loadedjimp, enforcing a max-edge and byte budget with PNG→JPEG fallback. - Hooks compression into the Agent prompt ingestion RPC (
prompt/steer),ReadMediaFileimage outputs, and MCP result processing (pre output-limits). - Adds focused unit/integration tests covering fast-path, dimension/byte budgets, alpha handling, fallback behavior, and ingestion points.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pnpm-lock.yaml | Locks new dependency tree changes including jimp and transitive codecs. |
| packages/agent-core/package.json | Adds jimp dependency for pure-JS image processing. |
| packages/agent-core/src/tools/support/image-compress.ts | New core implementation for downsampling/re-encoding images and rewriting inline data URLs. |
| packages/agent-core/src/tools/builtin/file/read-media.ts | Compresses image bytes before emitting image_url data URLs while keeping original dimensions in the summary. |
| packages/agent-core/src/mcp/output.ts | Compresses inline image parts from MCP results before applying per-part byte caps. |
| packages/agent-core/src/agent/index.ts | Applies compression at the prompt ingestion chokepoint (rpcMethods.prompt / steer). |
| packages/agent-core/test/tools/read-media.test.ts | Adds coverage for ReadMediaFile downsampling behavior + original-dimension reporting. |
| packages/agent-core/test/tools/image-compress.test.ts | New unit tests for compressor behavior (fast path, ladder, alpha, robustness, performance). |
| packages/agent-core/test/mcp/output.test.ts | Updates MCP output pipeline tests to async and adds downsampling assertion for real images. |
| packages/agent-core/test/agent/prompt-image-compression.test.ts | New integration tests validating prompt RPC compression vs passthrough for small images. |
| .changeset/image-compression.md | Declares user-facing release notes and bumps for CLI/SDK packages. |
Files not reviewed (1)
- pnpm-lock.yaml: Generated file
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const passthrough = (): CompressImageResult => ({ | ||
| data: bytes, | ||
| mimeType, | ||
| width: dims?.width ?? 0, | ||
| height: dims?.height ?? 0, | ||
| changed: false, | ||
| originalByteLength: bytes.length, | ||
| finalByteLength: bytes.length, | ||
| }); |
| let bytes: Buffer; | ||
| try { | ||
| bytes = Buffer.from(base64, 'base64'); | ||
| } catch { | ||
| return { | ||
| base64, | ||
| mimeType, | ||
| changed: false, | ||
| originalByteLength: 0, | ||
| finalByteLength: 0, | ||
| }; | ||
| } |
The prompt/steer RPC handlers await image compression before turn.launch() synchronously claims the active turn, so two overlapping calls could both compress first — letting the faster-to-compress one win the turn and strand the other on agent_busy. Run these two RPCs through a per-agent serialization chain so they claim in submit order; cancel and the other RPCs stay immediate.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2d8a145305
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // failure. Serialized via enqueueTurnRpc so the compression `await` | ||
| // cannot race the turn-claim. | ||
| this.enqueueTurnRpc(async () => { | ||
| this.turn.prompt(await compressImageContentParts(payload.input)); |
There was a problem hiding this comment.
Reserve prompt state before compression
When prompt is called while an existing turn is active and the new payload has an oversized inline image, this awaits compression before calling TurnFlow.prompt, so the activeTurn/busy check in launch() runs only after compression. If the current turn finishes during that compression window, the prompt that would previously have been rejected as turn.agent_busy at submission time starts a fresh turn; a cancel sent during the same prelaunch window also has no active turn to abort. Please reserve/reject the turn synchronously and compress after that decision.
Useful? React with 👍 / 👎.
|
|
||
| try { | ||
| const { Jimp } = await import('jimp'); | ||
| const image = await Jimp.fromBuffer(Buffer.from(bytes)); |
There was a problem hiding this comment.
Cap decoded pixels before Jimp loads the image
For a PNG/JPEG with very large dimensions but a small compressed byte size, the fast path is skipped because the longest edge exceeds the cap, and this call fully decodes the image into a bitmap before any pixel-count guard. A solid or otherwise highly-compressible 30000x30000 image can therefore allocate gigabytes inside the agent when submitted via prompt, ReadMediaFile, or MCP output; add a maximum decoded-pixel guard before Jimp.fromBuffer or skip compression for images above that bound.
Useful? React with 👍 / 👎.
Adding jimp to the workspace changed pnpm-lock.yaml, so the pnpmDeps fixed-output hash was stale and the nix build failed. Update it to the value the CI nix build reported.
What
Oversized images are now automatically downsampled and re-encoded before they reach the model, cutting vision-token cost and avoiding provider image-size errors.
Where it hooks (single convergence, inside the core)
Rather than scattering compression across every client, handling is centralized in
agent-core:rpcMethods.prompt/steer. Every client transport (CLI, web, desktop, ACP, SDK) submits prompts through this RPC, so one hook covers them all. Compression runs once per prompt, before the turn records or sends it, so the recorded history and the model-facing payload agree.ReadMediaFileand MCP tool output (the two producers of tool-side images). MCP compresses before the per-part byte cap, so a large-but-compressible screenshot is kept instead of dropped.A shared
compressImageContentParts/compressImageForModellives intools/support/image-compress.ts(pure-JS viajimp, lazily loaded; already-small images take a codec-free fast path).Testing
compressImageContentParts(data-URL parts, remote-URL passthrough, id preservation).