Skip to content

feat(core): add retry for transient HTTP and stream-level errors#13

Merged
patlux merged 4 commits into
patlux:mainfrom
takltc:feat/retry-config
Jun 2, 2026
Merged

feat(core): add retry for transient HTTP and stream-level errors#13
patlux merged 4 commits into
patlux:mainfrom
takltc:feat/retry-config

Conversation

@takltc

@takltc takltc commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Add configurable retry mechanism for transient failures in streamCommandCode, responding to timeoutMs, maxRetries, and maxRetryDelayMs from pi settings.json retry provider config.

  • HTTP-level retry: retries on 429/5xx responses with exponential backoff + jitter; respects Retry-After header (seconds and HTTP-date formats).
  • Stream-level retry: retries when API returns 200 OK but the SSE stream contains an error event (e.g. "Service temporarily unavailable"), only if no content has been emitted yet.
  • Per-attempt timeout: each retry attempt has its own timeout via AbortController; timed-out attempts retry immediately without backoff delay.
  • State reset between attempts: properly resets finished flag, clears output.content, and distinguishes timeout from user-initiated abort via attemptTimedOut flag to prevent "Agent is already processing" errors on retry.
  • 14 new tests covering all retry scenarios (HTTP errors, stream errors, Retry-After, timeout, abort, defaults).

Changes

File Description
src/core.ts Retry loop with HTTP + stream-level error handling, state reset
src/types.ts New StreamOptions fields: timeoutMs, maxRetries, maxRetryDelayMs
tests/test-retry.ts 14 test cases for all retry paths
tests/helpers.ts responseDelay support for timeout testing
CHANGELOG.md Added Unreleased entry

Test plan

  • npm test — all 68 tests pass
  • npm run typecheck — no type errors
  • npm run format:check — Prettier clean
  • Local pi integration tested with symlinked package

@takltc takltc requested a review from patlux as a code owner June 2, 2026 03:05
Add retry mechanism driven by pi settings.json retry.provider config
(timeoutMs, maxRetries, maxRetryDelayMs).

HTTP-level retries handle 429/5xx with exponential backoff and jitter,
respecting Retry-After headers (seconds and HTTP-date formats).

Stream-level retries handle cases where the API returns 200 OK but
sends an error event in the stream body. Retries only when no content
has been emitted yet.

Per-attempt timeout via AbortController with automatic retry. Clean
abort propagation through the retry loop.
@takltc takltc force-pushed the feat/retry-config branch from 671e399 to c3107db Compare June 2, 2026 03:38
Comment thread src/core.ts Fixed

@patlux patlux left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the retry work and the broad test coverage. I’m requesting changes for a few blockers before merge: CI is currently red, the per-attempt timeout is cleared before stream consumption so it does not actually protect hanging streams, maxRetryDelayMs: 0 behaves opposite to the documented contract, and the default retry count diverges from pi’s provider retry default/visibility model.

Comment thread tests/test-retry.ts Outdated
Comment thread src/core.ts Outdated
Comment thread src/core.ts
Comment thread src/core.ts Outdated
@takltc

takltc commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the detailed review — pushed c47a533 with the requested changes:

  1. Gitleaks / test fixtures — Replaced mock-key with option-key (TEST_API_KEY) in tests/test-retry.ts so we no longer hit the pi-test-api-key rule.

  2. Per-attempt timeout through stream read — The attempt timer now stays active for the whole attempt (fetch + body read). reader.read() is raced against attemptController.signal. Added regression test: response starts (chunked headers) but hangs with no finish event, then succeeds on retry.

  3. maxRetryDelayMs: 00 is normalized to no cap (Infinity); default 60s still applies when the option is omitted. Retry-After cap is checked even when maxRetries is 0. Added test for uncapped Retry-After when maxRetryDelayMs is 0.

  4. Default maxRetries — Provider default is now 0 to match pi’s retry.provider.maxRetries semantics when pi passes undefined; tests that expect retries pass maxRetries explicitly.

npm test and npm run format:check pass locally. Happy to adjust anything else.

Comment thread src/core.ts
@takltc takltc force-pushed the feat/retry-config branch from c47a533 to 4442a43 Compare June 2, 2026 07:55

@patlux patlux left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up. Most of the previous points are addressed in the final tree, but I still see two blockers: CI is still red because Gitleaks scans the PR commit history, and the stream timeout retry path can now retry after content has already been emitted, which mixes partial output from multiple attempts.

Comment thread tests/test-retry.ts
Comment thread src/core.ts Outdated
Align provider maxRetries default with pi (0), treat maxRetryDelayMs 0 as
no cap, keep per-attempt timeout through stream reads, and fix gitleaks
test fixtures.
@takltc takltc force-pushed the feat/retry-config branch from 51767ee to 57bcdc7 Compare June 2, 2026 08:15
@takltc

takltc commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Thanks again for the second review — pushed an update on feat/retry-config (amended commit 57bcdc7).

Stream timeout + partial output: canRetry no longer treats attemptTimedOut as sufficient when content was already emitted. Retries during stream read now require output.content.length === 0 only. When a retry is allowed, we also reset textBlock, currentTextIdx, and thinkingIdx so block state cannot leak across attempts.

Regression test: does NOT retry on timeout after partial text-delta was emitted — server sends one text-delta, then hangs; with maxRetries: 2 we still make a single request and end with error after the partial events.

Re Gitleaks / commit history: noted your “ignore that” — CI is green on our side with the allowlist entry.

Let me know if anything else should change before merge.

@patlux patlux left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All requested changes addressed. CI is green.

@patlux patlux merged commit 6b05595 into patlux:main Jun 2, 2026
10 checks passed
@patlux

patlux commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Shipped in pi-commandcode-provider@0.4.0 / tag v0.4.0.

@patlux

patlux commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Thank you for the contribution! This is included in pi-commandcode-provider@0.4.0 / tag v0.4.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants