Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changeset/fix-openai-max-output-size.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
---
"@moonshot-ai/kosong": patch
"@moonshot-ai/agent-core": patch
"@moonshot-ai/kimi-code": patch
---

Fix `max_tokens` exceeding provider limit for OpenAI-compatible endpoints. When `max_output_size` is configured, it is now used as a hard ceiling for `max_tokens` instead of being overridden by the generic 128k OpenAI ceiling. This prevents 400 errors from third-party providers (HuggingFace, Ollama, etc.) whose actual output limits are below 131072.
1 change: 1 addition & 0 deletions packages/agent-core/src/session/provider-manager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,7 @@ function toKosongProviderConfig(
baseUrl: providerValue(provider.baseUrl, provider.env, 'OPENAI_BASE_URL'),
apiKey: providerApiKey(provider),
reasoningKey,
...(maxOutputSize !== undefined ? { maxTokens: maxOutputSize } : {}),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor the max-token env opt-out

When KIMI_MODEL_MAX_COMPLETION_TOKENS=0 or KIMI_MODEL_MAX_TOKENS=0, resolveCompletionBudget returns undefined to disable completion-token clamping, but this line now bakes maxOutputSize into every OpenAI-compatible provider config. OpenAILegacyChatProvider.generate() serializes constructor maxTokens as max_tokens even when applyCompletionBudget is skipped, so any OpenAI-compatible model alias with maxOutputSize will still send a cap despite the documented env opt-out. Keep maxOutputSize in the budget path or avoid wiring it when the opt-out is active.

Useful? React with 👍 / 👎.

...defaultHeadersField({
...envCustomHeaders,
...kimiUserAgentHeader(kimiRequestHeaders),
Expand Down
6 changes: 3 additions & 3 deletions packages/agent-core/test/agent/config-state.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -121,9 +121,9 @@ describe('ConfigState model capabilities', () => {
signal: new AbortController().signal,
});

// maxOutputSize (384000) is clamped to the 128k ceiling applied to
// non-Kimi chat-completions providers.
expect(requestMaxTokens).toBe(131072);
// maxOutputSize (384000) is honoured as the hard ceiling for OpenAI-compatible
// providers. The generic 128k ceiling only applies when max_output_size is unset.
expect(requestMaxTokens).toBe(384000);
});

it('uses session id as a provider prompt cache hint without storing it on Agent', () => {
Expand Down
16 changes: 15 additions & 1 deletion packages/kosong/src/providers/openai-legacy.ts
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,7 @@ export class OpenAILegacyChatProvider implements ChatProvider {
private _reasoningKey: string | undefined;
private _reasoningEffort: string | undefined;
private _generationKwargs: OpenAILegacyGenerationKwargs;
private _explicitMaxTokens: boolean;
private _toolMessageConversion: ToolMessageConversion;
private _client: OpenAI | undefined;
private _httpClient: unknown;
Expand All @@ -475,6 +476,7 @@ export class OpenAILegacyChatProvider implements ChatProvider {
? normalizedReasoningKey
: undefined;
this._reasoningEffort = undefined;
this._explicitMaxTokens = options.maxTokens !== undefined;
this._generationKwargs =
options.maxTokens !== undefined ? completionTokenKwargs(this._model, options.maxTokens) : {};
this._toolMessageConversion = options.toolMessageConversion ?? null;
Expand Down Expand Up @@ -606,7 +608,19 @@ export class OpenAILegacyChatProvider implements ChatProvider {
) {
cap = Math.min(cap, options.maxContextTokens - options.usedContextTokens);
}
cap = Math.min(cap, CHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING);
if (this._explicitMaxTokens) {
// When max_output_size is explicitly configured, honour it as a hard upper
// bound. Third-party OpenAI-compatible providers (HuggingFace, Ollama, etc.)
// can have output limits below CHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING;
// applying the generic ceiling would override the user's intent and cause a 400.
const configuredCap =
this._generationKwargs.max_tokens ?? this._generationKwargs.max_completion_tokens;
if (configuredCap !== undefined) {
cap = Math.min(cap, configuredCap);
}
} else {
cap = Math.min(cap, CHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING);
}
return this.withGenerationKwargs(completionTokenKwargs(this._model, Math.max(1, cap)));
}

Expand Down
26 changes: 26 additions & 0 deletions packages/kosong/test/openai-legacy.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -655,6 +655,32 @@ describe('OpenAILegacyChatProvider', () => {
// 1000000 - 30000 = 970000, clamped to 131072
expect(body['max_tokens']).toBe(131072);
});

it('withMaxCompletionTokens respects explicit maxTokens as a ceiling for third-party providers', async () => {
// Reproduces issue #1148: a third-party provider (e.g. HuggingFace, Ollama) may
// have an output limit (e.g. 65536) lower than CHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING
// (131072). When max_output_size is configured, withMaxCompletionTokens must not
// override it with the generic ceiling.
const provider = new OpenAILegacyChatProvider({
model: 'deepseek-v4-pro',
apiKey: 'test-key',
stream: false,
maxTokens: 65536,
});
const capped = provider.withMaxCompletionTokens(1_048_576, {
usedContextTokens: 0,
maxContextTokens: 1_048_576,
});
const history: Message[] = [
{ role: 'user', content: [{ type: 'text', text: 'Hi' }], toolCalls: [] },
];
const body = await captureRequestBody(capped, '', [], history);

// Expected: 65536 (the explicit maxTokens cap).
// Bug: currently sends 131072 (CHAT_COMPLETIONS_MAX_OUTPUT_TOKENS_CEILING),
// which exceeds the model's actual API limit and causes a 400.
expect(body['max_tokens']).toBe(65536);
});
});

describe('maxTokens option', () => {
Expand Down