Skip to content

feat: add specific exceptions on provider errors and add retry#103

Open
levivannoort wants to merge 4 commits into
mainfrom
CLO-4353-build-timeout-issue
Open

feat: add specific exceptions on provider errors and add retry#103
levivannoort wants to merge 4 commits into
mainfrom
CLO-4353-build-timeout-issue

Conversation

@levivannoort

@levivannoort levivannoort commented May 16, 2026

Copy link
Copy Markdown

Summary

Hardens Adapter::call() against the transient network and provider conditions that were surfacing as build timeouts in CLO-4353, and relaxes an overly strict response validator in GitHub::getLatestCommit() that was causing legitimate commits to fail.

Motivation

Build workers were occasionally hanging for minutes per request. Two root causes:

  1. No HTTP timeouts. CURLOPT_CONNECTTIMEOUT and CURLOPT_TIMEOUT were unset, so a stalled TCP handshake against the provider could pin a worker until the kernel TCP timeout (~2 min) elapsed. Multiplied across retries, this trivially produced multi-minute hangs.
  2. Indiscriminate retry behavior. The previous loop retried any failure (including POSTs on transport errors), used unjittered exponential backoff, and honored Retry-After without bound — a single GitHub secondary rate-limit response (Retry-After: 3600) could block a build for an hour.

A secondary issue: GitHub::getLatestCommit() required author.avatar_url and author.html_url to be present. GitHub returns author: null whenever the commit's author email isn't linked to a GitHub user account (deleted users, bots, outside contributors with unlinked emails), causing the entire commit fetch — and the build — to fail.

Approach

src/VCS/Adapter.php — request lifecycle

  • Bounded timeouts. CURLOPT_CONNECTTIMEOUT=5, CURLOPT_TIMEOUT=15. Fail fast instead of pinning workers.
  • Idempotency-aware retries. Transport errors and 5xx responses are only retried for RFC 7231 idempotent methods (GET/HEAD/PUT/DELETE/OPTIONS) via a new isIdempotent() helper. POST and PATCH may have already been processed server-side, so they fail fast rather than risk duplicate side effects.
  • Jittered backoff. getRetryDelay() now multiplies the base delay (1s/2s/4s) by a random factor in [0.5, 1.5], so concurrent callers don't re-synchronize on the same retry schedule.
  • Bounded Retry-After. New $maxRetryAfterSeconds = 300 caps how long we'll honor a server-provided header. Typical values (e.g. Retry-After: 60) pass through unchanged; pathological values (e.g. Retry-After: 3600 from GitHub secondary rate limits) are clamped so a build can't be blocked indefinitely.
  • Robust Retry-After parsing. New parseRetryAfter() handles both forms allowed by RFC 7231 — delta-seconds and HTTP-date — instead of the previous bare (int) cast which silently produced 0 for date-formatted values.
  • Reordered body decoding. JSON decoding now happens after the 5xx branch. Previously, a transient 5xx with an empty or non-JSON body (common from gateways/proxies mid-outage) could throw a JSON parse error and be reported as a permanent failure instead of triggering a retry.
  • Naming/cleanup. maxRetriesmaxAttempts (the count is total attempts, not extra retries — the old name was misleading). Removed dead lastResponseStatus/lastResponseBody/lastResponseHeaders state and the now-unreachable post-loop fallback.

src/VCS/Adapter/Git/GitHub.phpgetLatestCommit()

  • Removed author.avatar_url and author.html_url from the required-fields check. They're still returned in the response payload, defaulting to '' via the existing null-coalescing in the return statement. Core commit data (sha, message, commit-level author name, html_url on the commit) remains required.

Test plan

  • Existing test suite passes (composer test / phpunit)
  • Verify a GET against a deliberately failing endpoint retries 3× with jittered delay and surfaces ProviderServerError
  • Verify a POST against a 5xx response fails fast (no retry) and throws ProviderServerError
  • Verify a 429 with Retry-After: 60 waits ~60s; Retry-After: 3600 is clamped to 300s
  • Verify Retry-After HTTP-date format (e.g. Wed, 21 Oct 2026 07:28:00 GMT) is parsed correctly
  • Verify getLatestCommit() succeeds against a commit whose author email is not linked to a GitHub account (e.g. a commit authored by a deleted user or unlinked email) — commitAuthorAvatar returns '' instead of throwing

🤖 Generated with Claude Code

@levivannoort levivannoort marked this pull request as ready for review May 22, 2026 08:05
@greptile-apps

greptile-apps Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR hardens Adapter::call() against build-blocking network hangs by adding bounded HTTP timeouts, idempotency-aware retries with jittered exponential backoff, a capped Retry-After ceiling, and robust HTTP-date parsing. It also relaxes an overly strict field check in GitHub::getLatestCommit() that was breaking commits from unlinked/deleted GitHub users.

  • Adapter::call() rewrite: adds 5s/15s cURL timeouts, retries only idempotent methods on transport/5xx errors, jitters backoff to avoid thundering-herd, and correctly orders JSON decoding after the 5xx retry check.
  • GitHub::isRateLimited() override: detects GitHub's primary-rate-limit 403+x-ratelimit-remaining:0 convention in addition to the base 429; retries without idempotency guard (safe because a 403 means the request was rejected before processing).
  • getLatestCommit() fix: drops author.avatar_url / author.html_url from required fields, defaulting both to '' via existing null-coalescing when author is null.

Confidence Score: 5/5

Safe to merge; the retry/timeout logic is well-structured, idempotency guards are correct, and the getLatestCommit() field relaxation handles the null-author case without dropping any core commit data.

The rewritten call() method correctly sequences all checks (curl error → rate-limit → 5xx → decode → return), idempotency logic matches RFC 7231, and the Retry-After capping prevents indefinite build blocks. The getLatestCommit() change is safe because the removed fields were always null-coalesced to '' in the return statement.

No files require special attention, though Gitea::getRepositoryName() and GitHub::getRepositoryName() are missing the >= 400 guard present in the sibling getRepository() methods.

Important Files Changed

Filename Overview
src/VCS/Adapter.php Core retry/timeout logic rewrite: adds CURLOPT timeouts, idempotency-aware retries, jittered backoff, bounded Retry-After, and proper HTTP-date parsing. Logic is sound; JSON decode is correctly ordered after the 5xx branch.
src/VCS/Adapter/Git/GitHub.php Removes overly strict avatar/url field requirements in getLatestCommit(), adds 404-specific handling in getRepositoryName(), and overrides isRateLimited() for GitHub's 403+x-ratelimit-remaining:0 convention.
src/VCS/Adapter/Git/GitLab.php Separates 404 from generic 4xx handling in getRepository() and getRepositoryPath(); adds missing-field guard for getRepositoryPath(). Consistent with the new error-classification pattern.
src/VCS/Adapter/Git/Gitea.php Same 404/4xx split as GitLab, but getRepositoryName() is missing the >= 400 guard present in getRepository(), causing confusing error messages for auth failures.
src/VCS/Exception/ProviderRateLimited.php New typed exception for rate-limit responses; minimal shell class extending \Exception.
src/VCS/Exception/ProviderRequestFailed.php New typed exception for transport/curl failures; minimal shell class extending \Exception.
src/VCS/Exception/ProviderServerError.php New typed exception for 5xx responses; minimal shell class extending \Exception.

Reviews (4): Last reviewed commit: "feat: enhance rate limit handling for Gi..." | Re-trigger Greptile

Comment thread src/VCS/Adapter.php
Comment thread src/VCS/Adapter.php Outdated
Comment thread src/VCS/Adapter.php Outdated
Comment thread src/VCS/Adapter.php Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants