Add Anthropic AI Gateway stream regression tests by jhaynie · Pull Request #21 · agentuity/llmproxy

jhaynie · 2026-05-26T13:29:35Z

Summary

add opt-in live coverage for Anthropic Messages streaming through llmproxy
add opt-in live coverage for deployed Agentuity AI Gateway Anthropic Messages streaming
assert message_stop, text deltas, sentinel text, and no thinking_delta when thinking is disabled

Verification

LLMPROXY_LIVE_AIGATEWAY_REGRESSION=1 gluon run -s ion -- go test -tags=integration . -run TestLiveAnthropicMessagesStreamCompletes -count=1 -v
go test -tags=integration . -run TestLiveAnthropicMessagesStreamCompletes -count=1 -v

Summary by CodeRabbit

Tests
- Enhanced integration test coverage for Anthropic message streaming functionality, including validation of stream completion and sentinel signals across multiple routing configurations.

coderabbitai · 2026-05-26T13:29:50Z

📝 Walkthrough

Walkthrough

This PR adds live integration tests for Anthropic message streaming to aigateway_regression_live_test.go. The change includes two test functions (one using the internal router, one using a direct gateway URL), a stream parser that extracts SSE events and delta fields, a summary type with formatted output, and validation helpers that check event counts and stream content.

Changes

Anthropic Streaming Live Regression Tests

Layer / File(s)	Summary
Stream parsing infrastructure `aigateway_regression_live_test.go`	Imports (`sort`, `strconv`, `strings`, `providers/anthropic`) enable streaming support. The `anthropicLiveStreamSummary` struct holds aggregated event type counts, text/thinking delta counts, and concatenated text. `summarizeAnthropicLiveStream` parses raw SSE payloads by splitting blocks, extracting `event:` and `data:` fields, decoding JSON, and aggregating counts and text content.
Stream validation and utilities `aigateway_regression_live_test.go`	`assertAnthropicLiveStream` validates parsed streams by checking for `message_stop` events, text deltas, a required sentinel string, and absence of thinking deltas. `fmtInt` and `firstNonEmptyLiveEnv` helpers format integers and select the first non-empty environment variable from a list of candidate names.
Integration tests `aigateway_regression_live_test.go`	`TestLiveAnthropicMessagesStreamCompletes` routes an Anthropic `/v1/messages` streaming request through the internal auto-router with `Accept: text/event-stream` header, reads the response, and validates stream content. `TestLiveAgentuityAIGatewayAnthropicMessagesStreamCompletes` sends a direct HTTP request to an AIGateway URL (base and API key from environment variables), includes authorization headers, and validates the stream behavior using the same assertion logic.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

aigateway_regression_live_test.go (2)

128-128: ⚡ Quick win

Consider reducing max_tokens for this smoke test.

The request allows up to 64,000 tokens but the prompt explicitly asks for a minimal reply ("Reply with GENESIS_DRIVER_SMOKE_OK and nothing else"). A much smaller value like 256 or 512 would suffice and reduce API costs.

💰 Proposed fix to reduce max_tokens

-	body := `{"model":"` + model + `","stream":true,"max_tokens":64000,"thinking":{"type":"disabled"},"messages":[{"role":"user","content":[{"type":"text","text":"Reply with GENESIS_DRIVER_SMOKE_OK and nothing else."}]}]}`
+	body := `{"model":"` + model + `","stream":true,"max_tokens":512,"thinking":{"type":"disabled"},"messages":[{"role":"user","content":[{"type":"text","text":"Reply with GENESIS_DRIVER_SMOKE_OK and nothing else."}]}]}`

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@aigateway_regression_live_test.go` at line 128, Reduce the unnecessary token
budget in the test payload: in aigateway_regression_live_test.go update the JSON
assigned to the body variable (the line that builds the request body string
containing "max_tokens":64000) to use a much smaller value such as 256 or 512
(e.g., "max_tokens":512) since the prompt expects a single short token, which
lowers API cost while preserving test intent.

178-178: ⚡ Quick win

Consider reducing max_tokens for this smoke test.

Similar to the internal router test, this request allows up to 64,000 tokens for a minimal reply. A smaller value like 512 would be sufficient and more cost-effective.

💰 Proposed fix to reduce max_tokens

-		"max_tokens": 64_000,
+		"max_tokens": 512,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@aigateway_regression_live_test.go` at line 178, Reduce the excessive token
allowance in the smoke test by changing the JSON request field "max_tokens" from
64_000 to a smaller value (e.g., 512) in the test that builds the request body
(the line containing "max_tokens": 64_000); update any related test expectations
or comments as needed to reflect the lower limit so the test remains valid and
cost-efficient.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@aigateway_regression_live_test.go`:
- Around line 256-301: In summarizeAnthropicLiveStream, avoid double-counting
EventTypes: stop unconditionally incrementing summary.EventTypes[eventType] from
the SSE "event:" line and instead increment summary.EventTypes only from the
parsed JSON payload.Type (payload.Type) when present; if payload.Type is empty,
you may fall back to using the SSE eventType once. Update logic around eventType
and payload.Type (references: summarizeAnthropicLiveStream,
anthropicLiveStreamSummary, EventTypes, payload.Type) so counts come from the
JSON payload first and the SSE event line only as a fallback.

---

Nitpick comments:
In `@aigateway_regression_live_test.go`:
- Line 128: Reduce the unnecessary token budget in the test payload: in
aigateway_regression_live_test.go update the JSON assigned to the body variable
(the line that builds the request body string containing "max_tokens":64000) to
use a much smaller value such as 256 or 512 (e.g., "max_tokens":512) since the
prompt expects a single short token, which lowers API cost while preserving test
intent.
- Line 178: Reduce the excessive token allowance in the smoke test by changing
the JSON request field "max_tokens" from 64_000 to a smaller value (e.g., 512)
in the test that builds the request body (the line containing "max_tokens":
64_000); update any related test expectations or comments as needed to reflect
the lower limit so the test remains valid and cost-efficient.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 92d057cc-e600-4e96-8ecf-b02b81f80197

📥 Commits

Reviewing files that changed from the base of the PR and between 610f141 and e1930d9.

📒 Files selected for processing (1)

aigateway_regression_live_test.go

📜 Review details

🔇 Additional comments (4)

aigateway_regression_live_test.go (4)

8-8: LGTM!

Also applies to: 13-14, 20-20

223-238: LGTM!

240-254: LGTM!

303-305: LGTM!

Also applies to: 315-322

coderabbitai · 2026-05-26T13:33:41Z

+func summarizeAnthropicLiveStream(raw []byte) anthropicLiveStreamSummary {
+	summary := anthropicLiveStreamSummary{
+		EventTypes: make(map[string]int),
+	}
+	for _, block := range strings.Split(string(raw), "\n\n") {
+		var eventType string
+		var data string
+		for _, line := range strings.Split(block, "\n") {
+			line = strings.TrimSpace(line)
+			if strings.HasPrefix(line, "event:") {
+				eventType = strings.TrimSpace(strings.TrimPrefix(line, "event:"))
+			}
+			if strings.HasPrefix(line, "data:") {
+				data = strings.TrimSpace(strings.TrimPrefix(line, "data:"))
+			}
+		}
+		if eventType != "" {
+			summary.EventTypes[eventType]++
+		}
+		if data == "" {
+			continue
+		}
+		var payload struct {
+			Type  string `json:"type"`
+			Delta struct {
+				Type     string `json:"type"`
+				Text     string `json:"text"`
+				Thinking string `json:"thinking"`
+			} `json:"delta"`
+		}
+		if err := json.Unmarshal([]byte(data), &payload); err != nil {
+			continue
+		}
+		if payload.Type != "" {
+			summary.EventTypes[payload.Type]++
+		}
+		switch payload.Delta.Type {
+		case "text_delta":
+			summary.TextDeltas++
+			summary.Text += payload.Delta.Text
+		case "thinking_delta":
+			summary.ThinkingDeltas++
+		}
+	}
+	return summary
+}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Event types may be double-counted in the summary.

The function counts event types from both the SSE event: line (line 273) and the JSON type field (line 290). In Anthropic's SSE format, these typically have the same value:

event: content_block_delta data: {"type":"content_block_delta",...}

This results in each event being counted twice in EventTypes. While current assertions only check for presence (> 0), the counts themselves are misleading.

🔧 Proposed fix to avoid double-counting

Count only from the JSON payload to match the actual event semantics:

for _, block := range strings.Split(string(raw), "\n\n") { - var eventType string var data string for _, line := range strings.Split(block, "\n") { line = strings.TrimSpace(line) - if strings.HasPrefix(line, "event:") { - eventType = strings.TrimSpace(strings.TrimPrefix(line, "event:")) - } if strings.HasPrefix(line, "data:") { data = strings.TrimSpace(strings.TrimPrefix(line, "data:")) } } - if eventType != "" { - summary.EventTypes[eventType]++ - } if data == "" { continue }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

func summarizeAnthropicLiveStream(raw []byte) anthropicLiveStreamSummary {

summary := anthropicLiveStreamSummary{

EventTypes: make(map[string]int),

}

for _, block := range strings.Split(string(raw), "\n\n") {

var eventType string

var data string

for _, line := range strings.Split(block, "\n") {

line = strings.TrimSpace(line)

if strings.HasPrefix(line, "event:") {

eventType = strings.TrimSpace(strings.TrimPrefix(line, "event:"))

}

if strings.HasPrefix(line, "data:") {

data = strings.TrimSpace(strings.TrimPrefix(line, "data:"))

}

}

if eventType != "" {

summary.EventTypes[eventType]++

}

if data == "" {

continue

}

var payload struct {

Type string `json:"type"`

Delta struct {

Type string `json:"type"`

Text string `json:"text"`

Thinking string `json:"thinking"`

} `json:"delta"`

}

if err := json.Unmarshal([]byte(data), &payload); err != nil {

continue

}

if payload.Type != "" {

summary.EventTypes[payload.Type]++

}

switch payload.Delta.Type {

case "text_delta":

summary.TextDeltas++

summary.Text += payload.Delta.Text

case "thinking_delta":

summary.ThinkingDeltas++

}

}

return summary

}

func summarizeAnthropicLiveStream(raw []byte) anthropicLiveStreamSummary {

summary := anthropicLiveStreamSummary{

EventTypes: make(map[string]int),

}

for _, block := range strings.Split(string(raw), "\n\n") {

var data string

for _, line := range strings.Split(block, "\n") {

line = strings.TrimSpace(line)

if strings.HasPrefix(line, "data:") {

data = strings.TrimSpace(strings.TrimPrefix(line, "data:"))

}

}

if data == "" {

continue

}

var payload struct {

Type string `json:"type"`

Delta struct {

Type string `json:"type"`

Text string `json:"text"`

Thinking string `json:"thinking"`

} `json:"delta"`

}

if err := json.Unmarshal([]byte(data), &payload); err != nil {

continue

}

if payload.Type != "" {

summary.EventTypes[payload.Type]++

}

switch payload.Delta.Type {

case "text_delta":

summary.TextDeltas++

summary.Text += payload.Delta.Text

case "thinking_delta":

summary.ThinkingDeltas++

}

}

return summary

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@aigateway_regression_live_test.go` around lines 256 - 301, In summarizeAnthropicLiveStream, avoid double-counting EventTypes: stop unconditionally incrementing summary.EventTypes[eventType] from the SSE "event:" line and instead increment summary.EventTypes only from the parsed JSON payload.Type (payload.Type) when present; if payload.Type is empty, you may fall back to using the SSE eventType once. Update logic around eventType and payload.Type (references: summarizeAnthropicLiveStream, anthropicLiveStreamSummary, EventTypes, payload.Type) so counts come from the JSON payload first and the SSE event line only as a fallback.

Add Anthropic AI Gateway stream regression tests

e1930d9

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Anthropic AI Gateway stream regression tests#21

Add Anthropic AI Gateway stream regression tests#21
jhaynie wants to merge 1 commit into
mainfrom
test/anthropic-aigateway-stream-regression

jhaynie commented May 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jhaynie commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jhaynie commented May 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading