Skip to content

feat(api): add Anthropic /v1/messages/count_tokens endpoint#83

Open
pescn wants to merge 1 commit intomainfrom
feat/anthropic-count-tokens
Open

feat(api): add Anthropic /v1/messages/count_tokens endpoint#83
pescn wants to merge 1 commit intomainfrom
feat/anthropic-count-tokens

Conversation

@pescn
Copy link
Copy Markdown
Contributor

@pescn pescn commented May 4, 2026

Summary

  • Add POST /v1/messages/count_tokens to the Anthropic-compatible API surface, forwarding to upstream providers using the existing model resolution and failover infrastructure.
  • Schema is intentionally loose (tLooseObject) so newer Anthropic content block shapes pass through transparently — same approach used by /v1/messages.
  • Dedicated failover config: 30s timeout (vs 2 min for /v1/messages), and 404/405 added to retriable codes since not every Anthropic-compatible gateway implements the endpoint, and we want failover to a mapped sibling provider to kick in instead of returning the first 404.
  • normalizeAnthropicBaseUrl trims a trailing /v1 from baseUrl before appending /v1/messages/count_tokens, since some providers configure baseUrl already with /v1.
  • Upstream JSON / error parsing reuses parseJsonResponse for the defensive parsing the rest of the messages route uses.

Verification

scripts/verify-anthropic-count-tokens.sh is a black-box probe across three Anthropic-compatible coding gateways (Kimi Coding, DashScope-Anthropic, Volcengine-Ark). It:

  1. Reads API keys from env (never written to disk)
  2. Discovers a working model per provider via /v1/models
  3. Sends a real /v1/messages and /v1/messages/count_tokens probe
  4. Classifies as SUPPORTED / NOT_SUPPORTED / INCONCLUSIVE / LIKELY_SUPPORTED_BUT_REQUEST_SCHEMA_DIFF / AUTH_OR_PERMISSION_ISSUE_ON_COUNT_TOKENS

Run with:

KIMI_API_KEY=... DASHSCOPE_API_KEY=... VOLCENGINE_API_KEY=... \
  ./scripts/verify-anthropic-count-tokens.sh

The script's verdicts give us a real read on which upstreams need 404 failover (the reason 404 is in the retriable set) versus which we can route to confidently.

Test plan

  • bun run lint clean
  • bun run check clean
  • Existing api-helpers.test.ts (15 cases, from fix(api): restore SSE Content-Type for streaming responses #82) still passes
  • Manually exercise the new endpoint against a configured Anthropic provider through the gateway
  • Run verify-anthropic-count-tokens.sh against at least one real coding gateway

Summary by CodeRabbit

发布说明

  • 新功能

    • 新增消息令牌计数端点,支持多个上游提供商的请求转发与故障转移机制。
  • 测试

    • 新增验证脚本用于测试多个提供商的令牌计数功能支持情况。

Forward Anthropic-compatible token counting requests to upstream
providers. The endpoint accepts the same loose schema we use for
/v1/messages so newer content block shapes pass through unchanged,
and reuses provider/model resolution and failover so a count_tokens
call benefits from the same retry/multi-provider behavior as the
chat path.

Implementation notes:

- Dedicated COUNT_TOKENS_FAILOVER_CONFIG with a tighter 30s timeout
  and 404/405 added to retriable status codes — many providers either
  don't implement the endpoint or expose it under a slightly different
  path, so failing over to another mapped provider is the right
  default.
- normalizeAnthropicBaseUrl strips a trailing /v1 from baseUrl before
  appending /v1/messages/count_tokens, since some providers' baseUrl
  already includes the /v1 segment.
- Upstream JSON / error parsing goes through parseJsonResponse to keep
  the same defensive parsing the rest of the messages route uses.

Verification: scripts/verify-anthropic-count-tokens.sh probes Kimi
Coding / DashScope-Anthropic / Volcengine-Ark gateways for
count_tokens support using env-supplied API keys (never written to
disk). It auto-discovers a working model via /v1/models then sends a
real /v1/messages and /v1/messages/count_tokens probe and classifies
the result as SUPPORTED / NOT_SUPPORTED / INCONCLUSIVE.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Walkthrough

此 PR 添加了新的 /messages/count_tokens 端点,用于 token 计数。该端点接受 Anthropic 兼容的请求体,解析上游提供商候选项,执行故障转移重试逻辑,并将请求转发至多个上游提供商。同时添加了验证脚本以测试此功能。

Changes

Token 计数端点与验证

层级 / 文件(s) 摘要
数据与配置
backend/src/api/v1/messages.ts (第 148–163 行)
新增 tAnthropicMessageCountTokens TypeBox schema 定义请求形状;添加 COUNT_TOKENS_FAILOVER_CONFIG 配置,包括重试策略和超时设置。
核心实现
backend/src/api/v1/messages.ts (第 512–601 行)
实现 Anthropic 基础 URL 规范化、上游请求构建(含请求头和模型映射)、JSON 和错误体的安全解析等辅助函数。
端点与故障转移逻辑
backend/src/api/v1/messages.ts (第 610–765 行)
新增 .post("/messages/count_tokens") 路由,包含模型验证、候选项过滤、多候选迭代请求、基于可重试 HTTP 状态的故障转移决策,以及成功响应或结构化错误返回。
导入与依赖
backend/src/api/v1/messages.ts (第 20–41 行)
扩展故障转移/网络工具导入(如超时获取、可重试网络检测)和 JSON 解析工具。
验证脚本与测试
scripts/verify-anthropic-count-tokens.sh
新增 Bash 脚本,包含初始化配置、HTTP 请求辅助函数、模型发现、候选模型选择、提供商验证逻辑,验证 KIMI、DashScope、VolcEngine 三个提供商的 token 计数支持。

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as /messages/count_tokens
    participant ModelResolver as 模型解析器
    participant Failover as 故障转移层
    participant Upstream as 上游提供商

    Client->>API: POST /messages/count_tokens<br/>(model, content)
    API->>ModelResolver: 验证模型 & 解析提供商
    ModelResolver-->>API: 候选提供商列表
    
    API->>Failover: 初始化故障转移逻辑
    loop 遍历候选项
        Failover->>Upstream: POST /messages/count_tokens<br/>(规范化请求)
        alt 成功 (HTTP 200)
            Upstream-->>Failover: JSON 响应
            Failover->>Failover: 解析响应体
            Failover-->>API: 返回已解析 JSON
        else 可重试错误
            Upstream-->>Failover: HTTP 5xx/429
            Failover->>Failover: 重试当前候选项
        else 不可重试错误
            Upstream-->>Failover: HTTP 4xx (除外特定状态)
            Failover->>Failover: 尝试下一候选项
        end
    end
    
    API-->>Client: token_count 结果 或<br/>结构化错误响应
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 一只小兔跳过故障线,
Token 计数数得欢,
三家提供商齐效力,
验证脚本把路探,
转移策略护周全! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Pull request title directly summarizes the main change: adding a new Anthropic /v1/messages/count_tokens endpoint. It is concise, specific, and clearly identifies the primary feature addition.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/anthropic-count-tokens

Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new /messages/count_tokens endpoint to the messages API, providing token counting capabilities for Anthropic-compatible models. The implementation includes request schema validation, provider selection with failover support, and a set of utility functions for request construction and response parsing. A verification script is also added to test the functionality across different providers. Feedback suggests refactoring the manual failover loop to use the existing executeWithFailover service for better consistency and masking raw upstream error messages to avoid exposing internal implementation details.

type: "error",
error: {
type: "api_error",
message: text,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In API error handlers, log the full error details on the server for debugging, but return only a generic, structured error message to the client to avoid leaking internal implementation details. Returning raw upstream error bodies (even truncated) can expose internal infrastructure details or sensitive information from proxies/gateways.

Suggested change
message: text,
message: "An unexpected error occurred while processing the upstream response.",
References
  1. In API error handlers, log the full error details on the server for debugging, but return only a generic, structured error message to the client to avoid leaking internal implementation details.

Comment on lines +660 to +744
let lastResponse: Response | undefined;
let lastError: Error | undefined;

for (const candidate of candidates) {
const { url, init, proxy } = buildAnthropicCountTokensRequest(
upstreamBody,
candidate,
extraHeaders,
);

try {
const response = await fetchWithTimeout(
url,
init,
COUNT_TOKENS_FAILOVER_CONFIG.timeoutMs,
proxy,
);

if (response.ok) {
return await parseUpstreamJsonBody(
response,
"Anthropic count_tokens",
);
}

lastResponse = response;
const shouldTryNext =
COUNT_TOKENS_FAILOVER_CONFIG.retriableStatusCodes.includes(
response.status,
) && candidate !== candidates[candidates.length - 1];

logger.warn("count_tokens upstream request failed", {
provider: candidate.provider.name,
providerType: candidate.provider.type,
status: response.status,
shouldTryNext,
});

if (!shouldTryNext) {
set.status = response.status;
return await parseUpstreamErrorBody(response);
}
} catch (error) {
const err =
error instanceof Error ? error : new Error(String(error));
lastError = err;
const shouldTryNext =
isRetriableNetworkError(err, COUNT_TOKENS_FAILOVER_CONFIG) &&
candidate !== candidates[candidates.length - 1];

logger.warn("count_tokens upstream network error", {
provider: candidate.provider.name,
providerType: candidate.provider.type,
error: err.message,
shouldTryNext,
});

if (!shouldTryNext) {
set.status = 502;
return {
type: "error",
error: {
type: "api_error",
message: `Count tokens request failed: ${err.message}`,
},
};
}
}
}

if (lastResponse) {
set.status = lastResponse.status;
return await parseUpstreamErrorBody(lastResponse);
}

set.status = 502;
return {
type: "error",
error: {
type: "api_error",
message:
lastError?.message ||
"All upstream providers failed for token counting",
},
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The manual failover loop duplicates logic that is already encapsulated in the executeWithFailover service. Using the centralized service ensures that the request benefits from standard features like exponential backoff and consistent logging. This refactoring also correctly differentiates between non-retriable upstream errors (which are forwarded) and true failover exhaustion (which returns a 502), adhering to repository standards.

        const result = await executeWithFailover(
          candidates,
          (candidate) =>
            buildAnthropicCountTokensRequest(upstreamBody, candidate, extraHeaders),
          COUNT_TOKENS_FAILOVER_CONFIG,
        );

        if (result.success && result.response) {
          return await parseUpstreamJsonBody(
            result.response,
            "Anthropic count_tokens",
          );
        }

        if (result.response) {
          set.status = result.response.status;
          return await parseUpstreamErrorBody(result.response);
        }

        set.status = 502;
        return {
          type: "error",
          error: {
            type: "api_error",
            message:
              result.finalError ||
              "All upstream providers failed for token counting",
          },
        };
References
  1. When handling failover results, differentiate between non-retriable upstream errors (which should be forwarded to the client) and true failover exhaustion due to retriable errors (which may warrant a 502 Bad Gateway response).

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/src/api/v1/messages.ts`:
- Around line 685-701: The response body isn't being released when a retriable
upstream error will cause failover; before skipping to the next candidate (where
you compute shouldTryNext using COUNT_TOKENS_FAILOVER_CONFIG and candidates),
call await response.body?.cancel() to free the undici/Node fetch
socket/resources; do this just after deciding shouldTryNext is true and before
returning or continuing, keeping existing logic around lastResponse,
logger.warn, set.status and parseUpstreamErrorBody unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: c36135db-cace-4e7d-8bd4-708263e2f1c2

📥 Commits

Reviewing files that changed from the base of the PR and between 15300f1 and 31c6900.

📒 Files selected for processing (2)
  • backend/src/api/v1/messages.ts
  • scripts/verify-anthropic-count-tokens.sh

Comment on lines +685 to +701
lastResponse = response;
const shouldTryNext =
COUNT_TOKENS_FAILOVER_CONFIG.retriableStatusCodes.includes(
response.status,
) && candidate !== candidates[candidates.length - 1];

logger.warn("count_tokens upstream request failed", {
provider: candidate.provider.name,
providerType: candidate.provider.type,
status: response.status,
shouldTryNext,
});

if (!shouldTryNext) {
set.status = response.status;
return await parseUpstreamErrorBody(response);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Node.js fetch/undici, should Response.body be consumed or canceled before retrying another upstream request to avoid resource/socket retention?

💡 Result:

Yes. In Node.js’ undici-based fetch, you should fully consume or cancel the previous Response.body before retrying (i.e., before issuing the next upstream request), because undici does not rely on garbage collection to promptly release connection/socket resources. What to do - If you don’t need the response body: either - consume it (e.g., await response.body.dump / response.body.text / bytes), or - explicitly cancel it (e.g., await response.body.cancel). - If you only need headers: consider using HEAD so there’s no body to drain. Why - undici’s own docs state: “All response bodies must always be fully consumed or destroyed” and explain that connection resource release should not be left to GC. - Maintainer guidance in undici discussions echoes that you should “always either consume or cancel the response body,” because otherwise the body/connection resources can remain buffered until GC. Notes for retries - For undici’s retry logic, this pattern matters especially when the earlier response is left unread; leaving the body unconsumed can reduce connection reuse and can lead to stalls/deadlocks when connections are exhausted (undici explicitly warns about this risk). - If you use response.body.cancel, ensure you release any reader lock if you used getReader; in some undici versions this affected whether the server could close cleanly. Example pattern (headers only, then retry) const res = await fetch(url); if (!res.ok) { // Drain/cancel so the socket/connection can be released before retrying. await res.body?.dump; // or: await res.body.cancel; return retry; } await res.body.text; Recommendation Always either consume or cancel the response body before starting the next request, especially when using upstream retry logic / connection pooling. If you don’t need the body at all, prefer HEAD when possible.

Citations:


🏁 Script executed:

# First, check the file exists and get the relevant section
wc -l backend/src/api/v1/messages.ts

Repository: EM-GeekLab/NexusGate

Length of output: 100


🏁 Script executed:

# Read the code around lines 685-701 with more context
sed -n '670,720p' backend/src/api/v1/messages.ts

Repository: EM-GeekLab/NexusGate

Length of output: 1792


🏁 Script executed:

# Get more context - see the loop structure
sed -n '650,730p' backend/src/api/v1/messages.ts

Repository: EM-GeekLab/NexusGate

Length of output: 2634


🏁 Script executed:

# Check what parseUpstreamJsonBody and parseUpstreamErrorBody do
rg -A 10 "export.*parseUpstreamJsonBody|function parseUpstreamJsonBody" backend/

Repository: EM-GeekLab/NexusGate

Length of output: 729


🏁 Script executed:

# Also check if these are imported from elsewhere
rg "parseUpstreamJsonBody|parseUpstreamErrorBody" backend/src/api/v1/messages.ts | head -20

Repository: EM-GeekLab/NexusGate

Length of output: 342


🏁 Script executed:

# Check the full implementation of parseUpstreamErrorBody
sed -n '/^async function parseUpstreamErrorBody/,/^}/p' backend/src/api/v1/messages.ts

Repository: EM-GeekLab/NexusGate

Length of output: 606


在 failover 继续前释放失败响应体

Lines 685-701 在 shouldTryNext === true 时直接进入下一候选,但未消费或取消当前 response.body。根据 undici(Node.js fetch 基础库)的设计,Response.body 必须在重试前显式消费或取消,否则连接/套接字资源不会被及时释放,在密集重试时会导致连接池耗尽、资源滞后回收,影响系统稳定性。建议在继续下一个候选前调用 await response.body?.cancel() 释放资源。

建议修改
             logger.warn("count_tokens upstream request failed", {
               provider: candidate.provider.name,
               providerType: candidate.provider.type,
               status: response.status,
               shouldTryNext,
             });

+            if (shouldTryNext) {
+              await response.body?.cancel();
+              continue;
+            }
+
             if (!shouldTryNext) {
               set.status = response.status;
               return await parseUpstreamErrorBody(response);
             }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
lastResponse = response;
const shouldTryNext =
COUNT_TOKENS_FAILOVER_CONFIG.retriableStatusCodes.includes(
response.status,
) && candidate !== candidates[candidates.length - 1];
logger.warn("count_tokens upstream request failed", {
provider: candidate.provider.name,
providerType: candidate.provider.type,
status: response.status,
shouldTryNext,
});
if (!shouldTryNext) {
set.status = response.status;
return await parseUpstreamErrorBody(response);
}
lastResponse = response;
const shouldTryNext =
COUNT_TOKENS_FAILOVER_CONFIG.retriableStatusCodes.includes(
response.status,
) && candidate !== candidates[candidates.length - 1];
logger.warn("count_tokens upstream request failed", {
provider: candidate.provider.name,
providerType: candidate.provider.type,
status: response.status,
shouldTryNext,
});
if (shouldTryNext) {
await response.body?.cancel();
continue;
}
if (!shouldTryNext) {
set.status = response.status;
return await parseUpstreamErrorBody(response);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/src/api/v1/messages.ts` around lines 685 - 701, The response body
isn't being released when a retriable upstream error will cause failover; before
skipping to the next candidate (where you compute shouldTryNext using
COUNT_TOKENS_FAILOVER_CONFIG and candidates), call await response.body?.cancel()
to free the undici/Node fetch socket/resources; do this just after deciding
shouldTryNext is true and before returning or continuing, keeping existing logic
around lastResponse, logger.warn, set.status and parseUpstreamErrorBody
unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant