Add Anthropic AI Gateway stream regression tests#21
Conversation
📝 WalkthroughWalkthroughThis PR adds live integration tests for Anthropic message streaming to ChangesAnthropic Streaming Live Regression Tests
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
aigateway_regression_live_test.go (2)
128-128: ⚡ Quick winConsider reducing
max_tokensfor this smoke test.The request allows up to 64,000 tokens but the prompt explicitly asks for a minimal reply ("Reply with GENESIS_DRIVER_SMOKE_OK and nothing else"). A much smaller value like 256 or 512 would suffice and reduce API costs.
💰 Proposed fix to reduce max_tokens
- body := `{"model":"` + model + `","stream":true,"max_tokens":64000,"thinking":{"type":"disabled"},"messages":[{"role":"user","content":[{"type":"text","text":"Reply with GENESIS_DRIVER_SMOKE_OK and nothing else."}]}]}` + body := `{"model":"` + model + `","stream":true,"max_tokens":512,"thinking":{"type":"disabled"},"messages":[{"role":"user","content":[{"type":"text","text":"Reply with GENESIS_DRIVER_SMOKE_OK and nothing else."}]}]}`🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@aigateway_regression_live_test.go` at line 128, Reduce the unnecessary token budget in the test payload: in aigateway_regression_live_test.go update the JSON assigned to the body variable (the line that builds the request body string containing "max_tokens":64000) to use a much smaller value such as 256 or 512 (e.g., "max_tokens":512) since the prompt expects a single short token, which lowers API cost while preserving test intent.
178-178: ⚡ Quick winConsider reducing
max_tokensfor this smoke test.Similar to the internal router test, this request allows up to 64,000 tokens for a minimal reply. A smaller value like 512 would be sufficient and more cost-effective.
💰 Proposed fix to reduce max_tokens
- "max_tokens": 64_000, + "max_tokens": 512,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@aigateway_regression_live_test.go` at line 178, Reduce the excessive token allowance in the smoke test by changing the JSON request field "max_tokens" from 64_000 to a smaller value (e.g., 512) in the test that builds the request body (the line containing "max_tokens": 64_000); update any related test expectations or comments as needed to reflect the lower limit so the test remains valid and cost-efficient.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@aigateway_regression_live_test.go`:
- Around line 256-301: In summarizeAnthropicLiveStream, avoid double-counting
EventTypes: stop unconditionally incrementing summary.EventTypes[eventType] from
the SSE "event:" line and instead increment summary.EventTypes only from the
parsed JSON payload.Type (payload.Type) when present; if payload.Type is empty,
you may fall back to using the SSE eventType once. Update logic around eventType
and payload.Type (references: summarizeAnthropicLiveStream,
anthropicLiveStreamSummary, EventTypes, payload.Type) so counts come from the
JSON payload first and the SSE event line only as a fallback.
---
Nitpick comments:
In `@aigateway_regression_live_test.go`:
- Line 128: Reduce the unnecessary token budget in the test payload: in
aigateway_regression_live_test.go update the JSON assigned to the body variable
(the line that builds the request body string containing "max_tokens":64000) to
use a much smaller value such as 256 or 512 (e.g., "max_tokens":512) since the
prompt expects a single short token, which lowers API cost while preserving test
intent.
- Line 178: Reduce the excessive token allowance in the smoke test by changing
the JSON request field "max_tokens" from 64_000 to a smaller value (e.g., 512)
in the test that builds the request body (the line containing "max_tokens":
64_000); update any related test expectations or comments as needed to reflect
the lower limit so the test remains valid and cost-efficient.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 92d057cc-e600-4e96-8ecf-b02b81f80197
📒 Files selected for processing (1)
aigateway_regression_live_test.go
📜 Review details
🔇 Additional comments (4)
aigateway_regression_live_test.go (4)
8-8: LGTM!Also applies to: 13-14, 20-20
223-238: LGTM!
240-254: LGTM!
303-305: LGTM!Also applies to: 315-322
| func summarizeAnthropicLiveStream(raw []byte) anthropicLiveStreamSummary { | ||
| summary := anthropicLiveStreamSummary{ | ||
| EventTypes: make(map[string]int), | ||
| } | ||
| for _, block := range strings.Split(string(raw), "\n\n") { | ||
| var eventType string | ||
| var data string | ||
| for _, line := range strings.Split(block, "\n") { | ||
| line = strings.TrimSpace(line) | ||
| if strings.HasPrefix(line, "event:") { | ||
| eventType = strings.TrimSpace(strings.TrimPrefix(line, "event:")) | ||
| } | ||
| if strings.HasPrefix(line, "data:") { | ||
| data = strings.TrimSpace(strings.TrimPrefix(line, "data:")) | ||
| } | ||
| } | ||
| if eventType != "" { | ||
| summary.EventTypes[eventType]++ | ||
| } | ||
| if data == "" { | ||
| continue | ||
| } | ||
| var payload struct { | ||
| Type string `json:"type"` | ||
| Delta struct { | ||
| Type string `json:"type"` | ||
| Text string `json:"text"` | ||
| Thinking string `json:"thinking"` | ||
| } `json:"delta"` | ||
| } | ||
| if err := json.Unmarshal([]byte(data), &payload); err != nil { | ||
| continue | ||
| } | ||
| if payload.Type != "" { | ||
| summary.EventTypes[payload.Type]++ | ||
| } | ||
| switch payload.Delta.Type { | ||
| case "text_delta": | ||
| summary.TextDeltas++ | ||
| summary.Text += payload.Delta.Text | ||
| case "thinking_delta": | ||
| summary.ThinkingDeltas++ | ||
| } | ||
| } | ||
| return summary | ||
| } |
There was a problem hiding this comment.
Event types may be double-counted in the summary.
The function counts event types from both the SSE event: line (line 273) and the JSON type field (line 290). In Anthropic's SSE format, these typically have the same value:
event: content_block_delta
data: {"type":"content_block_delta",...}
This results in each event being counted twice in EventTypes. While current assertions only check for presence (> 0), the counts themselves are misleading.
🔧 Proposed fix to avoid double-counting
Count only from the JSON payload to match the actual event semantics:
for _, block := range strings.Split(string(raw), "\n\n") {
- var eventType string
var data string
for _, line := range strings.Split(block, "\n") {
line = strings.TrimSpace(line)
- if strings.HasPrefix(line, "event:") {
- eventType = strings.TrimSpace(strings.TrimPrefix(line, "event:"))
- }
if strings.HasPrefix(line, "data:") {
data = strings.TrimSpace(strings.TrimPrefix(line, "data:"))
}
}
- if eventType != "" {
- summary.EventTypes[eventType]++
- }
if data == "" {
continue
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func summarizeAnthropicLiveStream(raw []byte) anthropicLiveStreamSummary { | |
| summary := anthropicLiveStreamSummary{ | |
| EventTypes: make(map[string]int), | |
| } | |
| for _, block := range strings.Split(string(raw), "\n\n") { | |
| var eventType string | |
| var data string | |
| for _, line := range strings.Split(block, "\n") { | |
| line = strings.TrimSpace(line) | |
| if strings.HasPrefix(line, "event:") { | |
| eventType = strings.TrimSpace(strings.TrimPrefix(line, "event:")) | |
| } | |
| if strings.HasPrefix(line, "data:") { | |
| data = strings.TrimSpace(strings.TrimPrefix(line, "data:")) | |
| } | |
| } | |
| if eventType != "" { | |
| summary.EventTypes[eventType]++ | |
| } | |
| if data == "" { | |
| continue | |
| } | |
| var payload struct { | |
| Type string `json:"type"` | |
| Delta struct { | |
| Type string `json:"type"` | |
| Text string `json:"text"` | |
| Thinking string `json:"thinking"` | |
| } `json:"delta"` | |
| } | |
| if err := json.Unmarshal([]byte(data), &payload); err != nil { | |
| continue | |
| } | |
| if payload.Type != "" { | |
| summary.EventTypes[payload.Type]++ | |
| } | |
| switch payload.Delta.Type { | |
| case "text_delta": | |
| summary.TextDeltas++ | |
| summary.Text += payload.Delta.Text | |
| case "thinking_delta": | |
| summary.ThinkingDeltas++ | |
| } | |
| } | |
| return summary | |
| } | |
| func summarizeAnthropicLiveStream(raw []byte) anthropicLiveStreamSummary { | |
| summary := anthropicLiveStreamSummary{ | |
| EventTypes: make(map[string]int), | |
| } | |
| for _, block := range strings.Split(string(raw), "\n\n") { | |
| var data string | |
| for _, line := range strings.Split(block, "\n") { | |
| line = strings.TrimSpace(line) | |
| if strings.HasPrefix(line, "data:") { | |
| data = strings.TrimSpace(strings.TrimPrefix(line, "data:")) | |
| } | |
| } | |
| if data == "" { | |
| continue | |
| } | |
| var payload struct { | |
| Type string `json:"type"` | |
| Delta struct { | |
| Type string `json:"type"` | |
| Text string `json:"text"` | |
| Thinking string `json:"thinking"` | |
| } `json:"delta"` | |
| } | |
| if err := json.Unmarshal([]byte(data), &payload); err != nil { | |
| continue | |
| } | |
| if payload.Type != "" { | |
| summary.EventTypes[payload.Type]++ | |
| } | |
| switch payload.Delta.Type { | |
| case "text_delta": | |
| summary.TextDeltas++ | |
| summary.Text += payload.Delta.Text | |
| case "thinking_delta": | |
| summary.ThinkingDeltas++ | |
| } | |
| } | |
| return summary | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@aigateway_regression_live_test.go` around lines 256 - 301, In
summarizeAnthropicLiveStream, avoid double-counting EventTypes: stop
unconditionally incrementing summary.EventTypes[eventType] from the SSE "event:"
line and instead increment summary.EventTypes only from the parsed JSON
payload.Type (payload.Type) when present; if payload.Type is empty, you may fall
back to using the SSE eventType once. Update logic around eventType and
payload.Type (references: summarizeAnthropicLiveStream,
anthropicLiveStreamSummary, EventTypes, payload.Type) so counts come from the
JSON payload first and the SSE event line only as a fallback.
Summary
Verification
Summary by CodeRabbit