rpc-client: retry transient 'no available provider' errors instead of crashing by elina-chertova · Pull Request #504 · subsquid/squid-sdk

elina-chertova · 2026-06-21T12:17:13Z

See cause/remedy below.

Cause (proven)

The evm-dump pod dump-hyperliquid-testnet-0 (namespace evm-archive) was in a crash-loop (512 restarts, exit code 1 every ~50s). Each crash was:

RpcError: Errors from the following providers prevented the request from being fulfilled: dRPC, Alchemy.
  at validateError (evm/evm-rpc/lib/rpc.js:97)
  ...
code: -32503
data: { DRPC: { error: { code: 10, message: "User balance exceeded" } },
        Alchemy: { error: { code: -32001, message: "Unable to complete request at this time." } } }
rpcMethod: eth_getBlockByNumber

The configured endpoint is a uniblock aggregator (chainId 998). When all of its upstream providers momentarily fail, uniblock returns this -32503 error. RpcClient.isConnectionError() did not recognise it (it only matches rate-limit / execution-timeout / request-timed-out / connection / HTTP 4xx-5xx errors), so it escaped the retry machinery and propagated to the caller.

evm-dump builds its client with retryAttempts: Number.MAX_SAFE_INTEGER (evm/evm-dump/src/dumper.ts), and hyperliquid-testnet uses batch_limit: 1, so the request goes straight to batchCall whose only retry gate is isConnectionError. A non-retried error there therefore terminates the process → CrashLoopBackOff → the Dumper_Pod_Restarts alert.

Remedy (tested)

The error is transient — it is the JSON-RPC analog of an HTTP 503 ("no upstream could serve this right now"), which isConnectionError already retries. I re-probed the exact failing request (eth_getBlockByNumber 0x34d7f6d withTx=true) against the same uniblock endpoint 4/4 succeeded, confirming a single retry would have recovered.

This PR makes isConnectionError treat the aggregator "providers prevented the request from being fulfilled" error as a connection error, so it is retried with backoff like the sibling transient errors, instead of crashing the dump. The fix is central (one chokepoint all RPC methods flow through), so it also protects receipts/traces calls, not just eth_getBlockByNumber.

The underlying provider degradation (dRPC "User balance exceeded", Alchemy "Unable to complete request") is being handled separately as an operator/provider escalation; per our policy a provider swap is a temporary mitigation, while this is the durable code fix that prevents a momentary provider hiccup from crash-looping ingestion.

Falsification

If the dump still crash-loops on this exact -32503 after this change, or if the provider error is in fact sustained (not transient) — i.e. the same request fails on repeated retries against an independent node implementation — then retrying is not the right behavior and the resolution is purely the provider escalation. Re-probing showed the request succeeding on retry, so retry is correct here.

RPC aggregators (e.g. uniblock) reply with an RpcError whose message is "Errors from the following providers prevented the request from being fulfilled: ..." when none of their upstream providers can serve a request at that moment. This is the JSON-RPC analog of an HTTP 503 and is transient, but isConnectionError() did not recognise it, so the error escaped the retry machinery and propagated to the caller. For ingestion processes that set retryAttempts to a high value (e.g. evm-dump), a single momentary upstream hiccup therefore terminated the process instead of being retried, producing a crash-loop. Treat it as a connection error so it is retried with backoff like rate-limit/timeout errors already are.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpc-client: retry transient 'no available provider' errors instead of crashing#504

rpc-client: retry transient 'no available provider' errors instead of crashing#504
elina-chertova wants to merge 1 commit into
masterfrom
alert-fix/jvqYTR-retry-provider-unavailable

elina-chertova commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elina-chertova commented Jun 21, 2026

Cause (proven)

Remedy (tested)

Falsification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant