feat(api): add Anthropic /v1/messages/count_tokens endpoint by pescn · Pull Request #83 · EM-GeekLab/NexusGate

pescn · 2026-05-04T14:24:10Z

Summary

Add POST /v1/messages/count_tokens to the Anthropic-compatible API surface, forwarding to upstream providers using the existing model resolution and failover infrastructure.
Schema is intentionally loose (tLooseObject) so newer Anthropic content block shapes pass through transparently — same approach used by /v1/messages.
Dedicated failover config: 30s timeout (vs 2 min for /v1/messages), and 404/405 added to retriable codes since not every Anthropic-compatible gateway implements the endpoint, and we want failover to a mapped sibling provider to kick in instead of returning the first 404.
normalizeAnthropicBaseUrl trims a trailing /v1 from baseUrl before appending /v1/messages/count_tokens, since some providers configure baseUrl already with /v1.
Upstream JSON / error parsing reuses parseJsonResponse for the defensive parsing the rest of the messages route uses.

Verification

scripts/verify-anthropic-count-tokens.sh is a black-box probe across three Anthropic-compatible coding gateways (Kimi Coding, DashScope-Anthropic, Volcengine-Ark). It:

Reads API keys from env (never written to disk)
Discovers a working model per provider via /v1/models
Sends a real /v1/messages and /v1/messages/count_tokens probe
Classifies as SUPPORTED / NOT_SUPPORTED / INCONCLUSIVE / LIKELY_SUPPORTED_BUT_REQUEST_SCHEMA_DIFF / AUTH_OR_PERMISSION_ISSUE_ON_COUNT_TOKENS

Run with:

KIMI_API_KEY=... DASHSCOPE_API_KEY=... VOLCENGINE_API_KEY=... \
  ./scripts/verify-anthropic-count-tokens.sh

The script's verdicts give us a real read on which upstreams need 404 failover (the reason 404 is in the retriable set) versus which we can route to confidently.

Test plan

bun run lint clean
bun run check clean
Existing api-helpers.test.ts (15 cases, from fix(api): restore SSE Content-Type for streaming responses #82) still passes
Manually exercise the new endpoint against a configured Anthropic provider through the gateway
Run verify-anthropic-count-tokens.sh against at least one real coding gateway

Summary by CodeRabbit

发布说明

新功能
- 新增消息令牌计数端点，支持多个上游提供商的请求转发与故障转移机制。
测试
- 新增验证脚本用于测试多个提供商的令牌计数功能支持情况。

Forward Anthropic-compatible token counting requests to upstream providers. The endpoint accepts the same loose schema we use for /v1/messages so newer content block shapes pass through unchanged, and reuses provider/model resolution and failover so a count_tokens call benefits from the same retry/multi-provider behavior as the chat path. Implementation notes: - Dedicated COUNT_TOKENS_FAILOVER_CONFIG with a tighter 30s timeout and 404/405 added to retriable status codes — many providers either don't implement the endpoint or expose it under a slightly different path, so failing over to another mapped provider is the right default. - normalizeAnthropicBaseUrl strips a trailing /v1 from baseUrl before appending /v1/messages/count_tokens, since some providers' baseUrl already includes the /v1 segment. - Upstream JSON / error parsing goes through parseJsonResponse to keep the same defensive parsing the rest of the messages route uses. Verification: scripts/verify-anthropic-count-tokens.sh probes Kimi Coding / DashScope-Anthropic / Volcengine-Ark gateways for count_tokens support using env-supplied API keys (never written to disk). It auto-discovers a working model via /v1/models then sends a real /v1/messages and /v1/messages/count_tokens probe and classifies the result as SUPPORTED / NOT_SUPPORTED / INCONCLUSIVE.

coderabbitai · 2026-05-04T14:24:22Z

📝 Walkthrough

Walkthrough

此 PR 添加了新的 /messages/count_tokens 端点，用于 token 计数。该端点接受 Anthropic 兼容的请求体，解析上游提供商候选项，执行故障转移重试逻辑，并将请求转发至多个上游提供商。同时添加了验证脚本以测试此功能。

Changes

Token 计数端点与验证

层级 / 文件(s)	摘要
数据与配置 `backend/src/api/v1/messages.ts` (第 148–163 行)	新增 `tAnthropicMessageCountTokens` TypeBox schema 定义请求形状；添加 `COUNT_TOKENS_FAILOVER_CONFIG` 配置，包括重试策略和超时设置。
核心实现 `backend/src/api/v1/messages.ts` (第 512–601 行)	实现 Anthropic 基础 URL 规范化、上游请求构建（含请求头和模型映射）、JSON 和错误体的安全解析等辅助函数。
端点与故障转移逻辑 `backend/src/api/v1/messages.ts` (第 610–765 行)	新增 `.post("/messages/count_tokens")` 路由，包含模型验证、候选项过滤、多候选迭代请求、基于可重试 HTTP 状态的故障转移决策，以及成功响应或结构化错误返回。
导入与依赖 `backend/src/api/v1/messages.ts` (第 20–41 行)	扩展故障转移/网络工具导入（如超时获取、可重试网络检测）和 JSON 解析工具。
验证脚本与测试 `scripts/verify-anthropic-count-tokens.sh`	新增 Bash 脚本，包含初始化配置、HTTP 请求辅助函数、模型发现、候选模型选择、提供商验证逻辑，验证 KIMI、DashScope、VolcEngine 三个提供商的 token 计数支持。

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as /messages/count_tokens
    participant ModelResolver as 模型解析器
    participant Failover as 故障转移层
    participant Upstream as 上游提供商

    Client->>API: POST /messages/count_tokens<br/>(model, content)
    API->>ModelResolver: 验证模型 & 解析提供商
    ModelResolver-->>API: 候选提供商列表
    
    API->>Failover: 初始化故障转移逻辑
    loop 遍历候选项
        Failover->>Upstream: POST /messages/count_tokens<br/>(规范化请求)
        alt 成功 (HTTP 200)
            Upstream-->>Failover: JSON 响应
            Failover->>Failover: 解析响应体
            Failover-->>API: 返回已解析 JSON
        else 可重试错误
            Upstream-->>Failover: HTTP 5xx/429
            Failover->>Failover: 重试当前候选项
        else 不可重试错误
            Upstream-->>Failover: HTTP 4xx (除外特定状态)
            Failover->>Failover: 尝试下一候选项
        end
    end
    
    API-->>Client: token_count 结果 或<br/>结构化错误响应

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 一只小兔跳过故障线，
Token 计数数得欢，
三家提供商齐效力，
验证脚本把路探，
转移策略护周全！ ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Pull request title directly summarizes the main change: adding a new Anthropic /v1/messages/count_tokens endpoint. It is concise, specific, and clearly identifies the primary feature addition.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/anthropic-count-tokens

_{Review rate limit: 4/5 reviews remaining, refill in 12 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a new /messages/count_tokens endpoint to the messages API, providing token counting capabilities for Anthropic-compatible models. The implementation includes request schema validation, provider selection with failover support, and a set of utility functions for request construction and response parsing. A verification script is also added to test the functionality across different providers. Feedback suggests refactoring the manual failover loop to use the existing executeWithFailover service for better consistency and masking raw upstream error messages to avoid exposing internal implementation details.

gemini-code-assist · 2026-05-04T14:26:05Z

+      type: "error",
+      error: {
+        type: "api_error",
+        message: text,


In API error handlers, log the full error details on the server for debugging, but return only a generic, structured error message to the client to avoid leaking internal implementation details. Returning raw upstream error bodies (even truncated) can expose internal infrastructure details or sensitive information from proxies/gateways.

Suggested change

message: text,

message: "An unexpected error occurred while processing the upstream response.",

References

In API error handlers, log the full error details on the server for debugging, but return only a generic, structured error message to the client to avoid leaking internal implementation details.

gemini-code-assist · 2026-05-04T14:26:05Z

+        let lastResponse: Response | undefined;
+        let lastError: Error | undefined;
+
+        for (const candidate of candidates) {
+          const { url, init, proxy } = buildAnthropicCountTokensRequest(
+            upstreamBody,
+            candidate,
+            extraHeaders,
+          );
+
+          try {
+            const response = await fetchWithTimeout(
+              url,
+              init,
+              COUNT_TOKENS_FAILOVER_CONFIG.timeoutMs,
+              proxy,
+            );
+
+            if (response.ok) {
+              return await parseUpstreamJsonBody(
+                response,
+                "Anthropic count_tokens",
+              );
+            }
+
+            lastResponse = response;
+            const shouldTryNext =
+              COUNT_TOKENS_FAILOVER_CONFIG.retriableStatusCodes.includes(
+                response.status,
+              ) && candidate !== candidates[candidates.length - 1];
+
+            logger.warn("count_tokens upstream request failed", {
+              provider: candidate.provider.name,
+              providerType: candidate.provider.type,
+              status: response.status,
+              shouldTryNext,
+            });
+
+            if (!shouldTryNext) {
+              set.status = response.status;
+              return await parseUpstreamErrorBody(response);
+            }
+          } catch (error) {
+            const err =
+              error instanceof Error ? error : new Error(String(error));
+            lastError = err;
+            const shouldTryNext =
+              isRetriableNetworkError(err, COUNT_TOKENS_FAILOVER_CONFIG) &&
+              candidate !== candidates[candidates.length - 1];
+
+            logger.warn("count_tokens upstream network error", {
+              provider: candidate.provider.name,
+              providerType: candidate.provider.type,
+              error: err.message,
+              shouldTryNext,
+            });
+
+            if (!shouldTryNext) {
+              set.status = 502;
+              return {
+                type: "error",
+                error: {
+                  type: "api_error",
+                  message: `Count tokens request failed: ${err.message}`,
+                },
+              };
+            }
+          }
+        }
+
+        if (lastResponse) {
+          set.status = lastResponse.status;
+          return await parseUpstreamErrorBody(lastResponse);
+        }
+
+        set.status = 502;
+        return {
+          type: "error",
+          error: {
+            type: "api_error",
+            message:
+              lastError?.message ||
+              "All upstream providers failed for token counting",
+          },
+        };


The manual failover loop duplicates logic that is already encapsulated in the executeWithFailover service. Using the centralized service ensures that the request benefits from standard features like exponential backoff and consistent logging. This refactoring also correctly differentiates between non-retriable upstream errors (which are forwarded) and true failover exhaustion (which returns a 502), adhering to repository standards.

const result = await executeWithFailover( candidates, (candidate) => buildAnthropicCountTokensRequest(upstreamBody, candidate, extraHeaders), COUNT_TOKENS_FAILOVER_CONFIG, ); if (result.success && result.response) { return await parseUpstreamJsonBody( result.response, "Anthropic count_tokens", ); } if (result.response) { set.status = result.response.status; return await parseUpstreamErrorBody(result.response); } set.status = 502; return { type: "error", error: { type: "api_error", message: result.finalError || "All upstream providers failed for token counting", }, };

References

When handling failover results, differentiate between non-retriable upstream errors (which should be forwarded to the client) and true failover exhaustion due to retriable errors (which may warrant a 502 Bad Gateway response).

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@backend/src/api/v1/messages.ts`:
- Around line 685-701: The response body isn't being released when a retriable
upstream error will cause failover; before skipping to the next candidate (where
you compute shouldTryNext using COUNT_TOKENS_FAILOVER_CONFIG and candidates),
call await response.body?.cancel() to free the undici/Node fetch
socket/resources; do this just after deciding shouldTryNext is true and before
returning or continuing, keeping existing logic around lastResponse,
logger.warn, set.status and parseUpstreamErrorBody unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: c36135db-cace-4e7d-8bd4-708263e2f1c2

📥 Commits

Reviewing files that changed from the base of the PR and between 15300f1 and 31c6900.

📒 Files selected for processing (2)

backend/src/api/v1/messages.ts
scripts/verify-anthropic-count-tokens.sh

coderabbitai · 2026-05-04T14:32:37Z

+            lastResponse = response;
+            const shouldTryNext =
+              COUNT_TOKENS_FAILOVER_CONFIG.retriableStatusCodes.includes(
+                response.status,
+              ) && candidate !== candidates[candidates.length - 1];
+
+            logger.warn("count_tokens upstream request failed", {
+              provider: candidate.provider.name,
+              providerType: candidate.provider.type,
+              status: response.status,
+              shouldTryNext,
+            });
+
+            if (!shouldTryNext) {
+              set.status = response.status;
+              return await parseUpstreamErrorBody(response);
+            }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Node.js fetch/undici, should Response.body be consumed or canceled before retrying another upstream request to avoid resource/socket retention?

💡 Result:

Yes. In Node.js’ undici-based fetch, you should fully consume or cancel the previous Response.body before retrying (i.e., before issuing the next upstream request), because undici does not rely on garbage collection to promptly release connection/socket resources. What to do - If you don’t need the response body: either - consume it (e.g., await response.body.dump / response.body.text / bytes), or - explicitly cancel it (e.g., await response.body.cancel). - If you only need headers: consider using HEAD so there’s no body to drain. Why - undici’s own docs state: “All response bodies must always be fully consumed or destroyed” and explain that connection resource release should not be left to GC. - Maintainer guidance in undici discussions echoes that you should “always either consume or cancel the response body,” because otherwise the body/connection resources can remain buffered until GC. Notes for retries - For undici’s retry logic, this pattern matters especially when the earlier response is left unread; leaving the body unconsumed can reduce connection reuse and can lead to stalls/deadlocks when connections are exhausted (undici explicitly warns about this risk). - If you use response.body.cancel, ensure you release any reader lock if you used getReader; in some undici versions this affected whether the server could close cleanly. Example pattern (headers only, then retry) const res = await fetch(url); if (!res.ok) { // Drain/cancel so the socket/connection can be released before retrying. await res.body?.dump; // or: await res.body.cancel; return retry; } await res.body.text; Recommendation Always either consume or cancel the response body before starting the next request, especially when using upstream retry logic / connection pooling. If you don’t need the body at all, prefer HEAD when possible.

Citations:

1: https://github.com/nodejs/undici/blob/main/docs/docs/api/Dispatcher.md

2: https://github.com/Kiruthika-S456/undici

3: How to "cancel" a body using `request`? nodejs/undici#3914

4: What has changed in `response.body.cancel()` between node 20 and 21? nodejs/undici#3208

🏁 Script executed:

# First, check the file exists and get the relevant section wc -l backend/src/api/v1/messages.ts

Repository: EM-GeekLab/NexusGate

Length of output: 100

🏁 Script executed:

# Read the code around lines 685-701 with more context sed -n '670,720p' backend/src/api/v1/messages.ts

Repository: EM-GeekLab/NexusGate

Length of output: 1792

🏁 Script executed:

# Get more context - see the loop structure sed -n '650,730p' backend/src/api/v1/messages.ts

Repository: EM-GeekLab/NexusGate

Length of output: 2634

🏁 Script executed:

# Check what parseUpstreamJsonBody and parseUpstreamErrorBody do rg -A 10 "export.*parseUpstreamJsonBody|function parseUpstreamJsonBody" backend/

Repository: EM-GeekLab/NexusGate

Length of output: 729

🏁 Script executed:

# Also check if these are imported from elsewhere rg "parseUpstreamJsonBody|parseUpstreamErrorBody" backend/src/api/v1/messages.ts | head -20

Repository: EM-GeekLab/NexusGate

Length of output: 342

🏁 Script executed:

# Check the full implementation of parseUpstreamErrorBody sed -n '/^async function parseUpstreamErrorBody/,/^}/p' backend/src/api/v1/messages.ts

Repository: EM-GeekLab/NexusGate

Length of output: 606

在 failover 继续前释放失败响应体

Lines 685-701 在 shouldTryNext === true 时直接进入下一候选，但未消费或取消当前 response.body。根据 undici（Node.js fetch 基础库）的设计，Response.body 必须在重试前显式消费或取消，否则连接/套接字资源不会被及时释放，在密集重试时会导致连接池耗尽、资源滞后回收，影响系统稳定性。建议在继续下一个候选前调用 await response.body?.cancel() 释放资源。

建议修改

logger.warn("count_tokens upstream request failed", { provider: candidate.provider.name, providerType: candidate.provider.type, status: response.status, shouldTryNext, }); + if (shouldTryNext) { + await response.body?.cancel(); + continue; + } + if (!shouldTryNext) { set.status = response.status; return await parseUpstreamErrorBody(response); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

lastResponse = response;

const shouldTryNext =

COUNT_TOKENS_FAILOVER_CONFIG.retriableStatusCodes.includes(

response.status,

) && candidate !== candidates[candidates.length - 1];

logger.warn("count_tokens upstream request failed", {

provider: candidate.provider.name,

providerType: candidate.provider.type,

status: response.status,

shouldTryNext,

});

if (!shouldTryNext) {

set.status = response.status;

return await parseUpstreamErrorBody(response);

}

lastResponse = response;

const shouldTryNext =

COUNT_TOKENS_FAILOVER_CONFIG.retriableStatusCodes.includes(

response.status,

) && candidate !== candidates[candidates.length - 1];

logger.warn("count_tokens upstream request failed", {

provider: candidate.provider.name,

providerType: candidate.provider.type,

status: response.status,

shouldTryNext,

});

if (shouldTryNext) {

await response.body?.cancel();

continue;

}

if (!shouldTryNext) {

set.status = response.status;

return await parseUpstreamErrorBody(response);

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@backend/src/api/v1/messages.ts` around lines 685 - 701, The response body isn't being released when a retriable upstream error will cause failover; before skipping to the next candidate (where you compute shouldTryNext using COUNT_TOKENS_FAILOVER_CONFIG and candidates), call await response.body?.cancel() to free the undici/Node fetch socket/resources; do this just after deciding shouldTryNext is true and before returning or continuing, keeping existing logic around lastResponse, logger.warn, set.status and parseUpstreamErrorBody unchanged.

gemini-code-assist Bot reviewed May 4, 2026

View reviewed changes

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): add Anthropic /v1/messages/count_tokens endpoint#83

feat(api): add Anthropic /v1/messages/count_tokens endpoint#83
pescn wants to merge 1 commit intomainfrom
feat/anthropic-count-tokens

pescn commented May 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

gemini-code-assist Bot May 4, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	message: text,
	message: "An unexpected error occurred while processing the upstream response.",

Conversation

pescn commented May 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Test plan

Summary by CodeRabbit

发布说明

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pescn commented May 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading