Skip to content

evm-rpc: tolerate null from trace_replayBlockTransactions (fixes tempo hotblocks crash-loop)#497

Merged
tmcgroul merged 1 commit into
masterfrom
alert-fix/W9Jhsy-tempo-tracereplay-null
Jun 18, 2026
Merged

evm-rpc: tolerate null from trace_replayBlockTransactions (fixes tempo hotblocks crash-loop)#497
tmcgroul merged 1 commit into
masterfrom
alert-fix/W9Jhsy-tempo-tracereplay-null

Conversation

@elina-chertova

Copy link
Copy Markdown
Contributor

Incident

Hotblocks data-lag alert on tempo-mainnet (correlated providers dwellir/uniblock), onset ~13:31 UTC on 2026-06-17, ~9 min after a fleet-wide deploy of evm-data-service:e56f20a9 (image bump commit d409edb at 13:20 UTC).

Root cause

The new image introduced a crash-loop on tempo: trace_replayBlockTransactions transiently returns result: null for a freshly-produced tip block whose trace index isn't ready yet. In Rpc.addTraceTxReplays, the result is validated with array(...), so a null throws a fatal, non-retryable DataValidationError: null is not an array, terminating ingestion repeatedly.

Evidence:

  • Old tempo pod (image 56617aaf) had 0 terminations over 3h before cutover; crash-looping began only with e56f20a9.
  • Direct RPC probe: the same blocks that returned null during the incident now return a valid empty array [] → confirms the null is a transient tip-race, not a permanent capability gap.
  • The sibling debug_* paths (addDebugStateDiffs / addDebugFrames, and eth_getBlockReceipts) already tolerate null by flagging the block for retry. addTraceTxReplays was the lone path missing this guard.

(Note: the cronos-mainnet error could not find results for height is pre-existing on both images and is a separate issue, not introduced by this bump.)

Fix

Mirror the existing debug-path pattern in addTraceTxReplays:

  • wrap the validator in nullable(array(...))
  • on null, set block._isInvalid = true + _errorMessage and continue, so the batch is retried instead of crashing.

Structurally identical to the already-shipping addDebugStateDiffs / addDebugFrames null handling.

🤖 Generated with Claude Code

A provider can transiently return `result: null` for a freshly produced
(tip) block whose trace index has not caught up yet. `addTraceTxReplays`
validated the result with `array(...)`, so a null response threw a fatal,
non-retryable `DataValidationError: null is not an array`, terminating
block ingestion and forcing a restart loop.

The sibling debug-trace paths (`addDebugStateDiffs`, `addDebugFrames`)
already handle this case: they accept a null result and mark the block
`_isInvalid` so `getBlocks` retries it (up to 5x). Apply the same pattern
to the trace-replay path: wrap the validator in `nullable(...)` and flag
the block for retry on null instead of crashing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kalabukdima kalabukdima requested a review from tmcgroul June 18, 2026 06:37
@tmcgroul tmcgroul merged commit 49a520c into master Jun 18, 2026
2 checks passed
@tmcgroul tmcgroul deleted the alert-fix/W9Jhsy-tempo-tracereplay-null branch June 18, 2026 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants