evm-rpc: retry "response too large" (-32020) instead of crash-looping the dumper by elina-chertova · Pull Request #505 · subsquid/squid-sdk

elina-chertova · 2026-06-23T05:59:12Z

Cause (proven)

okx_xlayer-mainnet_Hotblocks_Critical_Lag — the evm-okx-xlayer-mainnet-hotblocks-service lag grew to ~24 min and rising while the okx RPC head itself stayed current.

Pod logs (evm-okx-xlayer-mainnet-hotblocks-service-...-pjbsk, ns evm-hotblocks) show a crash-restart loop:

RpcError: backend response too large
    at validateError (/squid/evm/evm-rpc/lib/rpc.js:246:23)
  code: -32020, rpcUrl: https://xlayerrpc.okx.com/, rpcMethod: eth_getBlockReceipts
"data ingestion terminated, will restart in 30 seconds"

okx returns JSON-RPC error -32020 "backend response too large" on eth_getBlockReceipts. This code/message was recognised neither by EvmRpcClient.isConnectionError (so it was not retried) nor by EvmRpcClient.isResponseTooLargeError/isBatchRetryableError (so an oversized batch was never split). It therefore bubbled up to util-internal-data-service's run() as a fatal error → restart loop → growing lag.

The condition is transient/data-dependent, not a hard limit:

A single eth_getBlockReceipts for the failing block 0x3c7c244 now returns HTTP 200, 4.6 MB on okx (and identically on uniblock and the public rpc.xlayer.tech) — the exact same call that was -32020 during the incident.
Lag self-recovered (1.47M ms → ~1 s) once okx's backend stopped returning the error, confirming it is intermittent rather than a permanently-too-large block.

This is the same class of bug as #500/#501/#503: an intermittent upstream response should not crash the dumper.

Fix

Recognise -32020 / "response too large" as a retryable connection-class error in EvmRpcClient (new isResponseTooLargeError, wired into isConnectionError), mirroring the existing isUpstreamUnavailableError (#501) and the geth "response too large" / -32000 handling. Because reduceBatchOnRetry keys off isConnectionError, this also lets an oversized eth_getBlockReceipts batch be split in half until it fits — so both the intermittent single-call case (retry) and the oversized-batch case (split) are handled instead of crashing.

Falsification

If the dumper still crash-loops on -32020 after this change, the predicate isn't reached on the failing path (the fix would be wrong).
If a single block's receipts genuinely exceed okx's cap persistently (single-call eth_getBlockReceipts consistently -32020, not intermittent), retry/split cannot help and the durable answer is a per-tx receipts fallback or a different provider for that block — not this change. Probing showed the single-block call succeeds (4.6 MB), so that is not the current situation.

Note: a separate operator step is needed to roll a new subsquid/evm-data-service image carrying this commit into the infra deployment (currently pinned at e56f20a9).

… the dumper Some RPC backends (e.g. okx xlayer-mainnet) intermittently reject a request whose response exceeds an internal size cap with JSON-RPC error code -32020 "backend response too large". This error was neither recognised as retryable by EvmRpcClient nor by reduceBatchOnRetry, so it bubbled up as a fatal error and crashed the data-service into a "data ingestion terminated, will restart" loop, stalling hotblocks ingestion. The condition is transient/data-dependent: the exact same block fetches fine moments later, and an oversized eth_getBlockReceipts batch can simply be split. Recognise it as a retryable connection-class error so the client retries it and reduceBatchOnRetry (which keys off isConnectionError) splits oversized batches, mirroring the existing handling of geth's "response too large" / -32000. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

elina-chertova mentioned this pull request Jun 24, 2026

evm-rpc: trace state diffs per-transaction when a block's whole-block response is too big #507

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evm-rpc: retry "response too large" (-32020) instead of crash-looping the dumper#505

evm-rpc: retry "response too large" (-32020) instead of crash-looping the dumper#505
elina-chertova wants to merge 1 commit into
masterfrom
alert-fix/1pUdAQ-okx-xlayer-response-too-large

elina-chertova commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elina-chertova commented Jun 23, 2026

Cause (proven)

Fix

Falsification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant