Skip to content

feat(agent-core): compress oversized images before sending to the model#1243

Open
RealKai42 wants to merge 3 commits into
mainfrom
kaiyi/cebu-v2
Open

feat(agent-core): compress oversized images before sending to the model#1243
RealKai42 wants to merge 3 commits into
mainfrom
kaiyi/cebu-v2

Conversation

@RealKai42

Copy link
Copy Markdown
Collaborator

What

Oversized images are now automatically downsampled and re-encoded before they reach the model, cutting vision-token cost and avoiding provider image-size errors.

  • Longest edge ≤ 2000px and a per-image byte budget (~3.75MB raw / under a 5MB base64 ceiling).
  • PNG screenshots stay lossless — they only degrade to JPEG when the byte budget cannot otherwise be met.
  • Best-effort: if compression fails for any reason, the original image is sent unchanged (never blocks a prompt).

Where it hooks (single convergence, inside the core)

Rather than scattering compression across every client, handling is centralized in agent-core:

  • Prompt ingestion chokepointrpcMethods.prompt / steer. Every client transport (CLI, web, desktop, ACP, SDK) submits prompts through this RPC, so one hook covers them all. Compression runs once per prompt, before the turn records or sends it, so the recorded history and the model-facing payload agree.
  • Tool resultsReadMediaFile and MCP tool output (the two producers of tool-side images). MCP compresses before the per-part byte cap, so a large-but-compressible screenshot is kept instead of dropped.

A shared compressImageContentParts / compressImageForModel lives in tools/support/image-compress.ts (pure-JS via jimp, lazily loaded; already-small images take a codec-free fast path).

Testing

  • Unit tests for the compressor (dimension cap, byte-budget JPEG ladder, alpha handling, fallback on corrupt/empty/unsupported input, performance fast-path).
  • Unit tests for compressImageContentParts (data-URL parts, remote-URL passthrough, id preservation).
  • Integration test: a 2600px image submitted via the prompt RPC lands in history downsampled to ≤2000px; a small image is untouched.
  • MCP and ReadMediaFile tool-result compression tests.
  • Full suites green: agent-core, server, acp-adapter, node-sdk.

Downsample images to a 2000px longest-edge and per-image byte budget at the
single prompt-ingestion chokepoint (the prompt/steer RPC) and on tool results
(ReadMediaFile, MCP), so every client transport — CLI, web, desktop, ACP, SDK —
is covered uniformly inside the core. PNG screenshots stay lossless and only
degrade to JPEG when the byte budget cannot otherwise be met. Best-effort: the
original image is sent unchanged if compression fails.
@changeset-bot

changeset-bot Bot commented Jun 30, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: bc27ce5

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@moonshot-ai/kimi-code-sdk Minor
@moonshot-ai/kimi-code Minor
@moonshot-ai/acp-adapter Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions

Copy link
Copy Markdown
Contributor

❌ Nix build failed

Hash mismatch in pnpmDeps:

Hash
specified sha256-oratz8x67ZEJGTiNy+s4XaKe0TtpRKh63aIqkV79vvM=
got sha256-mqyi0VuPZwESZcdU5E8F3XUG99OH636knBfb8y6TQpw=

Please update flake.nix with the got hash.

@pkg-pr-new

pkg-pr-new Bot commented Jun 30, 2026

Copy link
Copy Markdown
pnpm dlx https://pkg.pr.new/@moonshot-ai/kimi-code@bc27ce5
npx https://pkg.pr.new/@moonshot-ai/kimi-code@bc27ce5

commit: bc27ce5

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fe827a5978

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/agent-core/src/agent/index.ts Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds centralized, best-effort image downsampling/re-encoding in packages/agent-core so oversized images are compressed before they’re recorded into history and before they reach the model (prompt ingestion RPC, ReadMediaFile, and MCP tool results). The goal is to reduce vision-token cost and avoid provider image-size limits while keeping already-small images on a fast path.

Changes:

  • Introduces a new shared image compressor (compressImageForModel / compressImageContentParts) based on lazy-loaded jimp, enforcing a max-edge and byte budget with PNG→JPEG fallback.
  • Hooks compression into the Agent prompt ingestion RPC (prompt/steer), ReadMediaFile image outputs, and MCP result processing (pre output-limits).
  • Adds focused unit/integration tests covering fast-path, dimension/byte budgets, alpha handling, fallback behavior, and ingestion points.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pnpm-lock.yaml Locks new dependency tree changes including jimp and transitive codecs.
packages/agent-core/package.json Adds jimp dependency for pure-JS image processing.
packages/agent-core/src/tools/support/image-compress.ts New core implementation for downsampling/re-encoding images and rewriting inline data URLs.
packages/agent-core/src/tools/builtin/file/read-media.ts Compresses image bytes before emitting image_url data URLs while keeping original dimensions in the summary.
packages/agent-core/src/mcp/output.ts Compresses inline image parts from MCP results before applying per-part byte caps.
packages/agent-core/src/agent/index.ts Applies compression at the prompt ingestion chokepoint (rpcMethods.prompt / steer).
packages/agent-core/test/tools/read-media.test.ts Adds coverage for ReadMediaFile downsampling behavior + original-dimension reporting.
packages/agent-core/test/tools/image-compress.test.ts New unit tests for compressor behavior (fast path, ladder, alpha, robustness, performance).
packages/agent-core/test/mcp/output.test.ts Updates MCP output pipeline tests to async and adds downsampling assertion for real images.
packages/agent-core/test/agent/prompt-image-compression.test.ts New integration tests validating prompt RPC compression vs passthrough for small images.
.changeset/image-compression.md Declares user-facing release notes and bumps for CLI/SDK packages.
Files not reviewed (1)
  • pnpm-lock.yaml: Generated file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +83 to +91
const passthrough = (): CompressImageResult => ({
data: bytes,
mimeType,
width: dims?.width ?? 0,
height: dims?.height ?? 0,
changed: false,
originalByteLength: bytes.length,
finalByteLength: bytes.length,
});
Comment on lines +160 to +171
let bytes: Buffer;
try {
bytes = Buffer.from(base64, 'base64');
} catch {
return {
base64,
mimeType,
changed: false,
originalByteLength: 0,
finalByteLength: 0,
};
}
The prompt/steer RPC handlers await image compression before turn.launch()
synchronously claims the active turn, so two overlapping calls could both
compress first — letting the faster-to-compress one win the turn and strand the
other on agent_busy. Run these two RPCs through a per-agent serialization chain
so they claim in submit order; cancel and the other RPCs stay immediate.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2d8a145305

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

// failure. Serialized via enqueueTurnRpc so the compression `await`
// cannot race the turn-claim.
this.enqueueTurnRpc(async () => {
this.turn.prompt(await compressImageContentParts(payload.input));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reserve prompt state before compression

When prompt is called while an existing turn is active and the new payload has an oversized inline image, this awaits compression before calling TurnFlow.prompt, so the activeTurn/busy check in launch() runs only after compression. If the current turn finishes during that compression window, the prompt that would previously have been rejected as turn.agent_busy at submission time starts a fresh turn; a cancel sent during the same prelaunch window also has no active turn to abort. Please reserve/reject the turn synchronously and compress after that decision.

Useful? React with 👍 / 👎.


try {
const { Jimp } = await import('jimp');
const image = await Jimp.fromBuffer(Buffer.from(bytes));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cap decoded pixels before Jimp loads the image

For a PNG/JPEG with very large dimensions but a small compressed byte size, the fast path is skipped because the longest edge exceeds the cap, and this call fully decodes the image into a bitmap before any pixel-count guard. A solid or otherwise highly-compressible 30000x30000 image can therefore allocate gigabytes inside the agent when submitted via prompt, ReadMediaFile, or MCP output; add a maximum decoded-pixel guard before Jimp.fromBuffer or skip compression for images above that bound.

Useful? React with 👍 / 👎.

Adding jimp to the workspace changed pnpm-lock.yaml, so the pnpmDeps
fixed-output hash was stale and the nix build failed. Update it to the value
the CI nix build reported.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants