feat(api): add Anthropic /v1/messages/count_tokens endpoint#83
feat(api): add Anthropic /v1/messages/count_tokens endpoint#83
Conversation
Forward Anthropic-compatible token counting requests to upstream providers. The endpoint accepts the same loose schema we use for /v1/messages so newer content block shapes pass through unchanged, and reuses provider/model resolution and failover so a count_tokens call benefits from the same retry/multi-provider behavior as the chat path. Implementation notes: - Dedicated COUNT_TOKENS_FAILOVER_CONFIG with a tighter 30s timeout and 404/405 added to retriable status codes — many providers either don't implement the endpoint or expose it under a slightly different path, so failing over to another mapped provider is the right default. - normalizeAnthropicBaseUrl strips a trailing /v1 from baseUrl before appending /v1/messages/count_tokens, since some providers' baseUrl already includes the /v1 segment. - Upstream JSON / error parsing goes through parseJsonResponse to keep the same defensive parsing the rest of the messages route uses. Verification: scripts/verify-anthropic-count-tokens.sh probes Kimi Coding / DashScope-Anthropic / Volcengine-Ark gateways for count_tokens support using env-supplied API keys (never written to disk). It auto-discovers a working model via /v1/models then sends a real /v1/messages and /v1/messages/count_tokens probe and classifies the result as SUPPORTED / NOT_SUPPORTED / INCONCLUSIVE.
📝 WalkthroughWalkthrough此 PR 添加了新的 ChangesToken 计数端点与验证
Sequence Diagram(s)sequenceDiagram
participant Client
participant API as /messages/count_tokens
participant ModelResolver as 模型解析器
participant Failover as 故障转移层
participant Upstream as 上游提供商
Client->>API: POST /messages/count_tokens<br/>(model, content)
API->>ModelResolver: 验证模型 & 解析提供商
ModelResolver-->>API: 候选提供商列表
API->>Failover: 初始化故障转移逻辑
loop 遍历候选项
Failover->>Upstream: POST /messages/count_tokens<br/>(规范化请求)
alt 成功 (HTTP 200)
Upstream-->>Failover: JSON 响应
Failover->>Failover: 解析响应体
Failover-->>API: 返回已解析 JSON
else 可重试错误
Upstream-->>Failover: HTTP 5xx/429
Failover->>Failover: 重试当前候选项
else 不可重试错误
Upstream-->>Failover: HTTP 4xx (除外特定状态)
Failover->>Failover: 尝试下一候选项
end
end
API-->>Client: token_count 结果 或<br/>结构化错误响应
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new /messages/count_tokens endpoint to the messages API, providing token counting capabilities for Anthropic-compatible models. The implementation includes request schema validation, provider selection with failover support, and a set of utility functions for request construction and response parsing. A verification script is also added to test the functionality across different providers. Feedback suggests refactoring the manual failover loop to use the existing executeWithFailover service for better consistency and masking raw upstream error messages to avoid exposing internal implementation details.
| type: "error", | ||
| error: { | ||
| type: "api_error", | ||
| message: text, |
There was a problem hiding this comment.
In API error handlers, log the full error details on the server for debugging, but return only a generic, structured error message to the client to avoid leaking internal implementation details. Returning raw upstream error bodies (even truncated) can expose internal infrastructure details or sensitive information from proxies/gateways.
| message: text, | |
| message: "An unexpected error occurred while processing the upstream response.", |
References
- In API error handlers, log the full error details on the server for debugging, but return only a generic, structured error message to the client to avoid leaking internal implementation details.
| let lastResponse: Response | undefined; | ||
| let lastError: Error | undefined; | ||
|
|
||
| for (const candidate of candidates) { | ||
| const { url, init, proxy } = buildAnthropicCountTokensRequest( | ||
| upstreamBody, | ||
| candidate, | ||
| extraHeaders, | ||
| ); | ||
|
|
||
| try { | ||
| const response = await fetchWithTimeout( | ||
| url, | ||
| init, | ||
| COUNT_TOKENS_FAILOVER_CONFIG.timeoutMs, | ||
| proxy, | ||
| ); | ||
|
|
||
| if (response.ok) { | ||
| return await parseUpstreamJsonBody( | ||
| response, | ||
| "Anthropic count_tokens", | ||
| ); | ||
| } | ||
|
|
||
| lastResponse = response; | ||
| const shouldTryNext = | ||
| COUNT_TOKENS_FAILOVER_CONFIG.retriableStatusCodes.includes( | ||
| response.status, | ||
| ) && candidate !== candidates[candidates.length - 1]; | ||
|
|
||
| logger.warn("count_tokens upstream request failed", { | ||
| provider: candidate.provider.name, | ||
| providerType: candidate.provider.type, | ||
| status: response.status, | ||
| shouldTryNext, | ||
| }); | ||
|
|
||
| if (!shouldTryNext) { | ||
| set.status = response.status; | ||
| return await parseUpstreamErrorBody(response); | ||
| } | ||
| } catch (error) { | ||
| const err = | ||
| error instanceof Error ? error : new Error(String(error)); | ||
| lastError = err; | ||
| const shouldTryNext = | ||
| isRetriableNetworkError(err, COUNT_TOKENS_FAILOVER_CONFIG) && | ||
| candidate !== candidates[candidates.length - 1]; | ||
|
|
||
| logger.warn("count_tokens upstream network error", { | ||
| provider: candidate.provider.name, | ||
| providerType: candidate.provider.type, | ||
| error: err.message, | ||
| shouldTryNext, | ||
| }); | ||
|
|
||
| if (!shouldTryNext) { | ||
| set.status = 502; | ||
| return { | ||
| type: "error", | ||
| error: { | ||
| type: "api_error", | ||
| message: `Count tokens request failed: ${err.message}`, | ||
| }, | ||
| }; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| if (lastResponse) { | ||
| set.status = lastResponse.status; | ||
| return await parseUpstreamErrorBody(lastResponse); | ||
| } | ||
|
|
||
| set.status = 502; | ||
| return { | ||
| type: "error", | ||
| error: { | ||
| type: "api_error", | ||
| message: | ||
| lastError?.message || | ||
| "All upstream providers failed for token counting", | ||
| }, | ||
| }; |
There was a problem hiding this comment.
The manual failover loop duplicates logic that is already encapsulated in the executeWithFailover service. Using the centralized service ensures that the request benefits from standard features like exponential backoff and consistent logging. This refactoring also correctly differentiates between non-retriable upstream errors (which are forwarded) and true failover exhaustion (which returns a 502), adhering to repository standards.
const result = await executeWithFailover(
candidates,
(candidate) =>
buildAnthropicCountTokensRequest(upstreamBody, candidate, extraHeaders),
COUNT_TOKENS_FAILOVER_CONFIG,
);
if (result.success && result.response) {
return await parseUpstreamJsonBody(
result.response,
"Anthropic count_tokens",
);
}
if (result.response) {
set.status = result.response.status;
return await parseUpstreamErrorBody(result.response);
}
set.status = 502;
return {
type: "error",
error: {
type: "api_error",
message:
result.finalError ||
"All upstream providers failed for token counting",
},
};References
- When handling failover results, differentiate between non-retriable upstream errors (which should be forwarded to the client) and true failover exhaustion due to retriable errors (which may warrant a 502 Bad Gateway response).
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/src/api/v1/messages.ts`:
- Around line 685-701: The response body isn't being released when a retriable
upstream error will cause failover; before skipping to the next candidate (where
you compute shouldTryNext using COUNT_TOKENS_FAILOVER_CONFIG and candidates),
call await response.body?.cancel() to free the undici/Node fetch
socket/resources; do this just after deciding shouldTryNext is true and before
returning or continuing, keeping existing logic around lastResponse,
logger.warn, set.status and parseUpstreamErrorBody unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: c36135db-cace-4e7d-8bd4-708263e2f1c2
📒 Files selected for processing (2)
backend/src/api/v1/messages.tsscripts/verify-anthropic-count-tokens.sh
| lastResponse = response; | ||
| const shouldTryNext = | ||
| COUNT_TOKENS_FAILOVER_CONFIG.retriableStatusCodes.includes( | ||
| response.status, | ||
| ) && candidate !== candidates[candidates.length - 1]; | ||
|
|
||
| logger.warn("count_tokens upstream request failed", { | ||
| provider: candidate.provider.name, | ||
| providerType: candidate.provider.type, | ||
| status: response.status, | ||
| shouldTryNext, | ||
| }); | ||
|
|
||
| if (!shouldTryNext) { | ||
| set.status = response.status; | ||
| return await parseUpstreamErrorBody(response); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Node.js fetch/undici, should Response.body be consumed or canceled before retrying another upstream request to avoid resource/socket retention?
💡 Result:
Yes. In Node.js’ undici-based fetch, you should fully consume or cancel the previous Response.body before retrying (i.e., before issuing the next upstream request), because undici does not rely on garbage collection to promptly release connection/socket resources. What to do - If you don’t need the response body: either - consume it (e.g., await response.body.dump / response.body.text / bytes), or - explicitly cancel it (e.g., await response.body.cancel). - If you only need headers: consider using HEAD so there’s no body to drain. Why - undici’s own docs state: “All response bodies must always be fully consumed or destroyed” and explain that connection resource release should not be left to GC. - Maintainer guidance in undici discussions echoes that you should “always either consume or cancel the response body,” because otherwise the body/connection resources can remain buffered until GC. Notes for retries - For undici’s retry logic, this pattern matters especially when the earlier response is left unread; leaving the body unconsumed can reduce connection reuse and can lead to stalls/deadlocks when connections are exhausted (undici explicitly warns about this risk). - If you use response.body.cancel, ensure you release any reader lock if you used getReader; in some undici versions this affected whether the server could close cleanly. Example pattern (headers only, then retry) const res = await fetch(url); if (!res.ok) { // Drain/cancel so the socket/connection can be released before retrying. await res.body?.dump; // or: await res.body.cancel; return retry; } await res.body.text; Recommendation Always either consume or cancel the response body before starting the next request, especially when using upstream retry logic / connection pooling. If you don’t need the body at all, prefer HEAD when possible.
Citations:
- 1: https://github.com/nodejs/undici/blob/main/docs/docs/api/Dispatcher.md
- 2: https://github.com/Kiruthika-S456/undici
- 3: How to "cancel" a body using `request`? nodejs/undici#3914
- 4: What has changed in `response.body.cancel()` between node 20 and 21? nodejs/undici#3208
🏁 Script executed:
# First, check the file exists and get the relevant section
wc -l backend/src/api/v1/messages.tsRepository: EM-GeekLab/NexusGate
Length of output: 100
🏁 Script executed:
# Read the code around lines 685-701 with more context
sed -n '670,720p' backend/src/api/v1/messages.tsRepository: EM-GeekLab/NexusGate
Length of output: 1792
🏁 Script executed:
# Get more context - see the loop structure
sed -n '650,730p' backend/src/api/v1/messages.tsRepository: EM-GeekLab/NexusGate
Length of output: 2634
🏁 Script executed:
# Check what parseUpstreamJsonBody and parseUpstreamErrorBody do
rg -A 10 "export.*parseUpstreamJsonBody|function parseUpstreamJsonBody" backend/Repository: EM-GeekLab/NexusGate
Length of output: 729
🏁 Script executed:
# Also check if these are imported from elsewhere
rg "parseUpstreamJsonBody|parseUpstreamErrorBody" backend/src/api/v1/messages.ts | head -20Repository: EM-GeekLab/NexusGate
Length of output: 342
🏁 Script executed:
# Check the full implementation of parseUpstreamErrorBody
sed -n '/^async function parseUpstreamErrorBody/,/^}/p' backend/src/api/v1/messages.tsRepository: EM-GeekLab/NexusGate
Length of output: 606
在 failover 继续前释放失败响应体
Lines 685-701 在 shouldTryNext === true 时直接进入下一候选,但未消费或取消当前 response.body。根据 undici(Node.js fetch 基础库)的设计,Response.body 必须在重试前显式消费或取消,否则连接/套接字资源不会被及时释放,在密集重试时会导致连接池耗尽、资源滞后回收,影响系统稳定性。建议在继续下一个候选前调用 await response.body?.cancel() 释放资源。
建议修改
logger.warn("count_tokens upstream request failed", {
provider: candidate.provider.name,
providerType: candidate.provider.type,
status: response.status,
shouldTryNext,
});
+ if (shouldTryNext) {
+ await response.body?.cancel();
+ continue;
+ }
+
if (!shouldTryNext) {
set.status = response.status;
return await parseUpstreamErrorBody(response);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| lastResponse = response; | |
| const shouldTryNext = | |
| COUNT_TOKENS_FAILOVER_CONFIG.retriableStatusCodes.includes( | |
| response.status, | |
| ) && candidate !== candidates[candidates.length - 1]; | |
| logger.warn("count_tokens upstream request failed", { | |
| provider: candidate.provider.name, | |
| providerType: candidate.provider.type, | |
| status: response.status, | |
| shouldTryNext, | |
| }); | |
| if (!shouldTryNext) { | |
| set.status = response.status; | |
| return await parseUpstreamErrorBody(response); | |
| } | |
| lastResponse = response; | |
| const shouldTryNext = | |
| COUNT_TOKENS_FAILOVER_CONFIG.retriableStatusCodes.includes( | |
| response.status, | |
| ) && candidate !== candidates[candidates.length - 1]; | |
| logger.warn("count_tokens upstream request failed", { | |
| provider: candidate.provider.name, | |
| providerType: candidate.provider.type, | |
| status: response.status, | |
| shouldTryNext, | |
| }); | |
| if (shouldTryNext) { | |
| await response.body?.cancel(); | |
| continue; | |
| } | |
| if (!shouldTryNext) { | |
| set.status = response.status; | |
| return await parseUpstreamErrorBody(response); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/src/api/v1/messages.ts` around lines 685 - 701, The response body
isn't being released when a retriable upstream error will cause failover; before
skipping to the next candidate (where you compute shouldTryNext using
COUNT_TOKENS_FAILOVER_CONFIG and candidates), call await response.body?.cancel()
to free the undici/Node fetch socket/resources; do this just after deciding
shouldTryNext is true and before returning or continuing, keeping existing logic
around lastResponse, logger.warn, set.status and parseUpstreamErrorBody
unchanged.
Summary
POST /v1/messages/count_tokensto the Anthropic-compatible API surface, forwarding to upstream providers using the existing model resolution and failover infrastructure.tLooseObject) so newer Anthropic content block shapes pass through transparently — same approach used by/v1/messages./v1/messages), and404/405added to retriable codes since not every Anthropic-compatible gateway implements the endpoint, and we want failover to a mapped sibling provider to kick in instead of returning the first 404.normalizeAnthropicBaseUrltrims a trailing/v1frombaseUrlbefore appending/v1/messages/count_tokens, since some providers configurebaseUrlalready with/v1.parseJsonResponsefor the defensive parsing the rest of the messages route uses.Verification
scripts/verify-anthropic-count-tokens.shis a black-box probe across three Anthropic-compatible coding gateways (Kimi Coding, DashScope-Anthropic, Volcengine-Ark). It:/v1/models/v1/messagesand/v1/messages/count_tokensprobeSUPPORTED/NOT_SUPPORTED/INCONCLUSIVE/LIKELY_SUPPORTED_BUT_REQUEST_SCHEMA_DIFF/AUTH_OR_PERMISSION_ISSUE_ON_COUNT_TOKENSRun with:
The script's verdicts give us a real read on which upstreams need
404failover (the reason 404 is in the retriable set) versus which we can route to confidently.Test plan
bun run lintcleanbun run checkcleanapi-helpers.test.ts(15 cases, from fix(api): restore SSE Content-Type for streaming responses #82) still passesverify-anthropic-count-tokens.shagainst at least one real coding gatewaySummary by CodeRabbit
发布说明
新功能
测试