feat: exit certificate f09 trace error#1629
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b59b2a43f9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
b59b2a4 to
10da7a7
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 10da7a7b61
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
62912c3 to
5ffd262
Compare
… API lbtFile was an escape hatch to skip Step 0 by providing a pre-generated LBT. Step 0 now always runs, so the field is dead weight. Removing it simplifies the config, resolveOrGenerateLBT, and RunStepC (the separate RunStepCWithEntries entry point is merged back into RunStepC). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ailure (F-09)
- Extend runWorkerPool/startWorkers/collectResults to accept context.Context;
feeder and workers respect ctx.Done() so cancellation stops all in-flight
work immediately instead of draining the full job list.
- When ContinueOnTraceError=false, the first failing trace cancels the derived
context and returns a meaningful error (capturing firstTraceErr before
context.Canceled errors arrive from aborted workers).
- Replace []common.Hash failedTraces with []FailedTrace{Hash, Error} so callers
can see which RPC error caused each failure.
- Add StepAWindowSize option (default 5000) to tune Step A chunk size
independently from BlockRange.
- Promote fetchSetSovereignTokenEvents / applySovereignTokenOverrides errors from
log.Warn (silently swallowed) to returned errors propagated through RunStep0.
- Fix nativeEntry variable shadowing in RunStep0 (:= → =).
- Add unit tests for traceOneTransaction (success, dedup, RPC error, bad JSON,
null+error) and traceTransactions (continueOnError, abort), plus end-to-end
TestRunStepA_AbortOnTraceError against a fake HTTP server.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- parseTargetBlock now returns (BlockNumberFinality, error); update call sites in config_test.go to handle the error properly - Use "LatestBlock" instead of "latest" in test config (NewBlockNumberFinality requires the canonical name, not the Ethereum RPC tag) - Update step_a_test.go to compare failed[0].Hash now that the return type changed from []common.Hash to []FailedTrace Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lation Draining in a background goroutine allowed in-flight workers to keep running after runWorkerPool returned, creating a window where closures could mutate captured state (e.g. firstTraceErr) in the caller. Blocking on the synchronous drain guarantees all workers finish before the caller resumes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…a_test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
58e010c to
aa513a8
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves the exit certificate Step A failure path by propagating trace errors, canceling worker activity on first trace failure when configured, and adding independent Step A window sizing.
Changes:
- Adds context-aware worker pool cancellation and RPC retry backoff cancellation.
- Introduces
StepAWindowSizeand updates config/examples/docs/tests. - Changes failed trace output to include both transaction hash and error, and promotes some Step 0 override errors to returned errors.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tools/exit_certificate/worker.go |
Adds context cancellation support to the shared worker pool. |
tools/exit_certificate/types.go |
Adds FailedTrace with hash and error fields. |
tools/exit_certificate/step_a.go |
Uses StepAWindowSize and aborts tracing on first failure when configured. |
tools/exit_certificate/step_a_test.go |
Adds trace and Step A failure-path tests. |
tools/exit_certificate/step_0.go |
Returns sovereign override errors and updates worker pool calls. |
tools/exit_certificate/step_e.go |
Updates worker pool call to pass context. |
tools/exit_certificate/rpc.go |
Makes RPC retries/backoff context-aware and updates worker pool call. |
tools/exit_certificate/rpc_test.go |
Updates backoff tests for context cancellation. |
tools/exit_certificate/config.go |
Adds stepAWindowSize config parsing/defaulting. |
tools/exit_certificate/config_test.go |
Adds config tests for Step A window sizing. |
tools/exit_certificate/README.md |
Documents blockRange scope and stepAWindowSize. |
tools/exit_certificate/config-examples/zkevm-mainnet.json |
Adds stepAWindowSize and updates L2 RPC URL. |
tools/exit_certificate/config-examples/zkevm-cardona.json |
Adds stepAWindowSize. |
…or test - Update StepAWindowSize doc comment to reflect its independent 5000 default (not BlockRange) - Update RunStepA1 doc comment to reference Options.StepAWindowSize instead of Options.BlockRange - Add a second tx and atomic trace counter to TestRunStepA_AbortOnTraceError so the test actually verifies that the worker pool stops after the first failure Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
web3security
left a comment
There was a problem hiding this comment.
The implementation correctly addresses F-09 (silent address omission on trace failure). The worker pool context propagation, abort-on-error path, and FailedTrace diagnostics are logically sound. Static analysis of the full diff shows no issues introduced by these changes.
CI failures — none are blockers introduced by this PR:
- test-unit (SonarQube HTTP 403): Infrastructure authentication failure on api.sonarcloud.io, not a Go test failure. The exit_certificate package tests pass: ok github.com/agglayer/aggkit/tools/exit_certificate 10.098s coverage: 29.8%.
- govulncheck: Two vulnerabilities reported — GO-2026-5026 (golang.org/x/net@v0.53.0) and GO-2026-4479 (github.com/pion/dtls/v2@v2.2.12). Neither vulnerability has a call trace through tools/exit_certificate. Both are pre-existing in the repository dependency tree, not introduced by this PR. See separate tracking note below.
- Multi chains E2E (CDK Erigon, pessimistic, 3 chains): Please confirm this is a pre-existing flaky/infra failure and not a regression from the context cancellation changes in worker.go. The changes are additive (ctx.Done() select guards) and should not affect E2E behavior, but developer confirmation of manual testing on Kurtosis is requested before merge.
Dependency vulnerabilities — track separately:
- GO-2026-5026: fixable by bumping golang.org/x/net to v0.55.0. Please open a dedicated issue.
- GO-2026-4479 (pion/dtls/v2): no upstream fix available yet. No call trace through exit_certificate code. Track until a patched version is released.
Fixes vulnerability GO-2026-5026 (Punycode IDNA label rejection failure in golang.org/x/net/idna). GO-2026-4479 (pion/dtls/v2) has no upstream fix available yet and remains tracked separately. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
6119eb3
into
feature/exit-certificate-tool



🔄 Changes Summary
ContinueOnTraceError=false(F-09)StepAWindowSizeconfig option to tune Step A chunk size independently fromblockRange[]common.Hash failedTraceswith[]FailedTrace{Hash, Error}so callers get the RPC error alongside the hashfetchSetSovereignTokenEvents/applySovereignTokenOverrideserrors from silentlog.Warnto returned errors inRunStep0nativeEntryinRunStep0(:=→=)lbtFileconfig field and mergeRunStepCWithEntriesback intoRunStepClbtFileoption removed fromOptions— it was an escape hatch to skip Step 0 that is no longer needed since Step 0 always runsblockRangeno longer controls the block window size in Step A —stepAWindowSizeis now used exclusively for that (defaults to 5000). Existing configs that relied onblockRangeto tune Step A throughput should add an explicitstepAWindowSizerunWorkerPool,startWorkers,collectResultsnow require acontext.Contextas first argument📋 Config Updates
stepAWindowSize(default: 5000):✅ Testing
traceOneTransaction(success, dedup, RPC error, bad JSON, null+error)traceTransactions(continueOnError path, abort-on-error path)TestRunStepA_AbortOnTraceErroragainst a fake HTTP server verifying context cancellation stops the poolcontinueOnTraceError=falseand verify the tool exits immediately with the offending hash and error message🐞 Issues
Problem 2 — Variable shadowing makes nil-check dead code)🔗 Related PRs
📝 Notes
StepAWindowSizeexists becausedebug_traceTransactionRPC calls are more expensive thaneth_getLogs; operators may want a smaller window for Step A without changing the log-query range used by Steps 0, B, and EcollectResultsdrainsresultChin a background goroutine to let workers release resources cleanly instead of blocking