Skip to content

chore(diagnostics): Instrument agent no-response failures#2740

Merged
charlesvien merged 6 commits into
mainfrom
chore/agent-no-response-diagnostics
Jun 17, 2026
Merged

chore(diagnostics): Instrument agent no-response failures#2740
charlesvien merged 6 commits into
mainfrom
chore/agent-no-response-diagnostics

Conversation

@charlesvien

@charlesvien charlesvien commented Jun 17, 2026

Copy link
Copy Markdown
Member

Problem

People intermittently get no response from the agent, transient and seemingly random, on both local and cloud tasks. The current logs don't capture the cause: undici's generic "terminated" hides the real socket reason, cloud SSE streams drop without naming the terminating hop, and local session-init failures surface only as opaque "Internal error".

Changes

  1. Add serializeError to @posthog/shared to flatten an error and its .cause chain, surfacing the real undici socket reason (ECONNRESET, UND_ERR_SOCKET) behind "terminated"
  2. Instrument cloud SSE streams: connect-time response headers (server, via, cf-ray, x-accel-buffering), resume detection, per-connection counters, connection durations
  3. Enrich the local LLM auth-proxy: error cause, headersSent, duration, bytes streamed, plus a per-request completion baseline
  4. De-opaque local agent session create/reconnect failures (real cause, code, session context) and add session-init phase timing (modelConfigMs vs initMs)
  5. Surface Claude CLI stderr and a "turn completed with no agent output" warning
  6. Add tests for serializeError and the proxy byte-counting

How did you test this?

Manually

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

@charlesvien charlesvien added the Stamphog This will request an autostamp by stamphog on small changes label Jun 17, 2026
@charlesvien charlesvien changed the title Chore/agent no response diagnostics chore(diagnostics): Instrument agent no-response failures Jun 17, 2026
@greptile-apps

greptile-apps Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Comments Outside Diff (1)

  1. packages/shared/src/errors.test.ts, line 573-603 (link)

    P2 Simple input→output cases suit a parameterised test

    Several consecutive test cases each call serializeError with a single flat input and assert the full return value. Per the team's style convention these should be consolidated into an it.each table — the cases for non-Error primitives, numeric codes, bare objects, and plain Error are all the same shape ([input, expected]) and are easy to express that way. The cause-chain and cyclic-depth cases are more complex and can stay as individual it blocks.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/shared/src/errors.test.ts
    Line: 573-603
    
    Comment:
    **Simple input→output cases suit a parameterised test**
    
    Several consecutive test cases each call `serializeError` with a single flat input and assert the full return value. Per the team's style convention these should be consolidated into an `it.each` table — the cases for non-Error primitives, numeric codes, bare objects, and plain `Error` are all the same shape (`[input, expected]`) and are easy to express that way. The `cause`-chain and cyclic-depth cases are more complex and can stay as individual `it` blocks.
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
packages/shared/src/errors.test.ts:573-603
**Simple input→output cases suit a parameterised test**

Several consecutive test cases each call `serializeError` with a single flat input and assert the full return value. Per the team's style convention these should be consolidated into an `it.each` table — the cases for non-Error primitives, numeric codes, bare objects, and plain `Error` are all the same shape (`[input, expected]`) and are easy to express that way. The `cause`-chain and cyclic-depth cases are more complex and can stay as individual `it` blocks.

Reviews (1): Last reviewed commit: "tidy error logging and add diagnostics t..." | Re-trigger Greptile

@charlesvien charlesvien force-pushed the chore/agent-no-response-diagnostics branch from 9d4313b to cfdc1e9 Compare June 17, 2026 19:08

Copy link
Copy Markdown
Member Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions

Copy link
Copy Markdown

React Doctor found no issues in the changed files. 🎉

Reviewed by React Doctor for commit cfdc1e9.

@stamphog stamphog Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gates denied this PR because it touches auth-proxy code and exceeds the T2-never tier threshold. While the changes appear to be purely diagnostic logging with no behavioral modifications to auth logic, the deny-list rule exists to ensure human review of any auth-adjacent changes. With zero reviews on a multi-file change touching auth infrastructure, a human reviewer must sign off before auto-approval.

@stamphog stamphog Bot removed the Stamphog This will request an autostamp by stamphog on small changes label Jun 17, 2026
@charlesvien charlesvien added the Create Release This will trigger a new release label Jun 17, 2026
@charlesvien charlesvien enabled auto-merge (squash) June 17, 2026 19:18
@charlesvien charlesvien disabled auto-merge June 17, 2026 19:18
@charlesvien charlesvien enabled auto-merge (squash) June 17, 2026 19:19
@DanielVisca DanielVisca self-requested a review June 17, 2026 19:19

@DanielVisca DanielVisca left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@charlesvien charlesvien merged commit e3bfae4 into main Jun 17, 2026
28 of 29 checks passed
@charlesvien charlesvien deleted the chore/agent-no-response-diagnostics branch June 17, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Create Release This will trigger a new release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants