feat(derivation): deriveForce skip#967
Conversation
…Number When local verify sees batchInfo.lastBlockNumber missing locally, the new dispatcher distinguishes four cases: - A consistent: header present + versioned hash matches L1 → existing rebuildBlob path, safe advances as before. - B self-heal: header present + versioned hash mismatch → existing self-heal, now calling deriveForce(_, 0) to keep the full-overwrite semantics. - C fill-gap: header missing AND L2 head stays flat across the 3×2s retry window → pull the real L1 blob via fetchRollupDataByTxHash, then deriveForce(_, localLatestL2) writes only blocks above the local head, leaving existing blocks untouched (no EL reorg). Falls through to the shared verifyBatchRoots + advanceSafe. - D wait-next-poll: header missing AND L2 head grew within the window → continue, sequencer is alive and P2P will deliver the missing blocks before the next poll cycle. deriveForce now takes a skipNumber so scenarios B and C share one implementation: skipNumber=0 means "overwrite everything", skipNumber>0 skips blocks already present locally and only appends the missing tail. The parent header anchor is max(firstNum-1, skipNumber) so the first block we actually write always has its parent in the EL. The retry loop surfaces non-NotFound RPC errors from HeaderByNumber / BlockNumber so transient client failures are visible instead of being masked as "sequencer stopped". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scenario C (header missing + L2 head flat) appends blocks above the local head via deriveForce. Even though it doesn't reorg existing blocks, blocksync / broadcast reactors may race with derivation on the same height range as P2P delivers blocks for the missing tail. Wrap the fill-gap block writes with StopReactorsBeforeReorg / StartReactorsAfterReorg so the reactor lifecycle matches scenario B self-heal: derivation owns the writes during the operation, then reactors restart from the post-write head and catch up via P2P. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The previous reactor stop/start pattern around deriveForce returned early on derive errors, leaving consensus reactors permanently stopped: Stop is idempotent on retry, but Start was never reached, so blocksync / broadcast stayed paused indefinitely until process restart. Wrap stop/derive/start in withReactorsQuiesced so Start runs in a defer regardless of body outcome. Restart height is read from the L2 EL latest (truth source) rather than the lastHeader returned by deriveForce — that header may be nil or stale if writes failed mid-batch, but the EL always knows what's actually applied. Use context.Background() for the deferred BlockNumber read so a cancelled parent ctx doesn't block reactor recovery. Both scenario B (self-heal) and scenario C (fill-gap) call sites now share the same helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous implementation logged the post-write BlockNumber read failure but still passed the zero-valued cur to StartReactorsAfterReorg, which would have told blocksync to resume from height 0 and re-fetch the entire chain from genesis. withReactorsQuiesced now captures the pre-write height before stopping reactors and uses it as a safe lower-bound fallback if the post-write EL read fails. If even the pre-write read fails we don't stop reactors at all and return the error so the batch retries next poll, rather than committing to a quiesce window with no way to pick a sensible restart height. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous loop took two RPCs per iteration (HeaderByNumber + BlockNumber) and a third pre-loop snapshot read. The simplified version uses a single BlockNumber call as the primary stat and only re-fetches the header when latest has actually caught up to lastBlockNumber, dropping the snapshot read entirely. Growth tracking is now a single prev variable compared across iterations; sequencer-stopped (no growth) cases skip the HeaderByNumber call inside the loop entirely. Behavior unchanged: lastHdr non-nil after loop → A/B; lastHdr nil + grew → D; lastHdr nil + no growth → C. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Number
When local verify sees batchInfo.lastBlockNumber missing locally, the new dispatcher distinguishes four cases:
deriveForce now takes a skipNumber so scenarios B and C share one implementation: skipNumber=0 means "overwrite everything", skipNumber>0 skips blocks already present locally and only appends the missing tail. The parent header anchor is max(firstNum-1, skipNumber) so the first block we actually write always has its parent in the EL.
The retry loop surfaces non-NotFound RPC errors from HeaderByNumber / BlockNumber so transient client failures are visible instead of being masked as "sequencer stopped".