Skip to content

feat: gap-fill sync mode #57

@pthmas

Description

@pthmas

Summary

Add a background gap-fill sync mode that detects and fills missing blocks without requiring a full reindex. Currently, if blocks are missed (due to crashes, RPC timeouts, or partial failures), the only recovery path is manual intervention or reindexing from scratch.

Motivation

Atlas already tracks failed_blocks, but there's no automated mechanism to:

  • Detect gaps in the blocks table (missing block numbers between indexed ranges)
  • Retry failed blocks after the initial failure window
  • Fill gaps that occur from crashes or restarts during indexing

Design

Gap Detection

Use SQL window functions to find missing block ranges:

SELECT number + 1 AS gap_start,
       next_number - 1 AS gap_end
FROM (
    SELECT number, LEAD(number) OVER (ORDER BY number) AS next_number
    FROM blocks
) gaps
WHERE next_number - number > 1
ORDER BY number DESC  -- newest gaps first
LIMIT 100;

Fast path: Before running the expensive window function, do a quick check:

SELECT (MAX(number) - MIN(number) + 1) - COUNT(*) AS missing_count FROM blocks;

If 0, skip the full scan.

Fill Strategy

  1. Run as a separate background tokio task alongside the main indexer
  2. Fill newest gaps first (most likely to be needed by users)
  3. Configurable concurrency (e.g., 4 workers, separate from main indexer workers)
  4. Adaptive throttling: Pause gap-fill when the main realtime sync falls behind (e.g., lag > 10 blocks), resume when caught up
  5. Reuse existing fetch and write infrastructure

Also retry failed_blocks

  • Periodically scan failed_blocks table for blocks that can be retried
  • Exponential backoff based on retry_count
  • Remove from failed_blocks on successful indexing

Considerations

  • Gap-fill should use a separate rate limiter / semaphore budget to avoid starving the realtime sync
  • Derived tables (ERC-20 balances, NFT ownership) need to be updated for gap-filled blocks
  • The indexer_state.last_indexed_block watermark should not advance past gaps — or a separate "contiguous height" metric should be tracked

References

  • tidx implementation: sync/engine.rsrun_gapfill_loop with LAG() gap detection, concurrent workers via JoinSet, adaptive throttling based on realtime lag

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions