Skip to content

feat: exit certificate f09 trace error#1629

Merged
joanestebanr merged 7 commits into
feature/exit-certificate-toolfrom
feat/exit_certificate_f09_trace_error
May 28, 2026
Merged

feat: exit certificate f09 trace error#1629
joanestebanr merged 7 commits into
feature/exit-certificate-toolfrom
feat/exit_certificate_f09_trace_error

Conversation

@joanestebanr
Copy link
Copy Markdown
Collaborator

@joanestebanr joanestebanr commented May 26, 2026

🔄 Changes Summary

  • Propagate trace errors through the worker pool and abort all in-flight workers on first failure when ContinueOnTraceError=false (F-09)
  • Add StepAWindowSize config option to tune Step A chunk size independently from blockRange
  • Replace []common.Hash failedTraces with []FailedTrace{Hash, Error} so callers get the RPC error alongside the hash
  • Promote fetchSetSovereignTokenEvents / applySovereignTokenOverrides errors from silent log.Warn to returned errors in RunStep0
  • Fix variable shadowing bug for nativeEntry in RunStep0 (:==)
  • Remove dead lbtFile config field and merge RunStepCWithEntries back into RunStepC

⚠️ Breaking Changes

  • 🛠️ Config: lbtFile option removed from Options — it was an escape hatch to skip Step 0 that is no longer needed since Step 0 always runs
  • 🛠️ Config: blockRange no longer controls the block window size in Step A — stepAWindowSize is now used exclusively for that (defaults to 5000). Existing configs that relied on blockRange to tune Step A throughput should add an explicit stepAWindowSize
  • 🔌 API/CLI: runWorkerPool, startWorkers, collectResults now require a context.Context as first argument

📋 Config Updates

  • 🧾 New optional field stepAWindowSize (default: 5000):
"options": {
    "blockRange": 10000,
    "stepAWindowSize": 10000
}

✅ Testing

  • 🤖 Automatic:
    • Unit tests for traceOneTransaction (success, dedup, RPC error, bad JSON, null+error)
    • Unit tests for traceTransactions (continueOnError path, abort-on-error path)
    • End-to-end TestRunStepA_AbortOnTraceError against a fake HTTP server verifying context cancellation stops the pool
  • 🖱️ Manual: Run Step A with a node that returns trace errors with continueOnTraceError=false and verify the tool exits immediately with the offending hash and error message

🐞 Issues

  • Closes agglayer/pm#346
  • Partially fixes agglayer/pm#349 (Problem 2 — Variable shadowing makes nil-check dead code)

🔗 Related PRs

  • Base: feat/exit_certificate_f05_target_block (block finality resolution for Step 0)

📝 Notes

  • StepAWindowSize exists because debug_traceTransaction RPC calls are more expensive than eth_getLogs; operators may want a smaller window for Step A without changing the log-query range used by Steps 0, B, and E
  • Context cancellation in collectResults drains resultCh in a background goroutine to let workers release resources cleanly instead of blocking

@joanestebanr joanestebanr self-assigned this May 26, 2026
@joanestebanr joanestebanr changed the base branch from feature/exit-certificate-tool to feat/exit_certificate_f05_target_block May 26, 2026 15:59
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b59b2a43f9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/exit_certificate/config.go Outdated
@joanestebanr joanestebanr force-pushed the feat/exit_certificate_f09_trace_error branch from b59b2a4 to 10da7a7 Compare May 26, 2026 16:07
@joanestebanr joanestebanr marked this pull request as draft May 26, 2026 16:08
@joanestebanr joanestebanr changed the title Feat/exit certificate f09 trace error feat: exit certificate f09 trace error May 27, 2026
@joanestebanr joanestebanr added the exit_certificate_tool Tool to create a final exit certificate label May 27, 2026
@joanestebanr joanestebanr marked this pull request as ready for review May 27, 2026 08:00
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 10da7a7b61

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/exit_certificate/worker.go Outdated
Comment thread tools/exit_certificate/config.go
@joanestebanr joanestebanr force-pushed the feat/exit_certificate_f09_trace_error branch 2 times, most recently from 62912c3 to 5ffd262 Compare May 27, 2026 09:03
Base automatically changed from feat/exit_certificate_f05_target_block to feature/exit-certificate-tool May 27, 2026 15:28
joanestebanr and others added 5 commits May 27, 2026 17:31
… API

lbtFile was an escape hatch to skip Step 0 by providing a pre-generated
LBT. Step 0 now always runs, so the field is dead weight. Removing it
simplifies the config, resolveOrGenerateLBT, and RunStepC (the separate
RunStepCWithEntries entry point is merged back into RunStepC).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ailure (F-09)

- Extend runWorkerPool/startWorkers/collectResults to accept context.Context;
  feeder and workers respect ctx.Done() so cancellation stops all in-flight
  work immediately instead of draining the full job list.
- When ContinueOnTraceError=false, the first failing trace cancels the derived
  context and returns a meaningful error (capturing firstTraceErr before
  context.Canceled errors arrive from aborted workers).
- Replace []common.Hash failedTraces with []FailedTrace{Hash, Error} so callers
  can see which RPC error caused each failure.
- Add StepAWindowSize option (default 5000) to tune Step A chunk size
  independently from BlockRange.
- Promote fetchSetSovereignTokenEvents / applySovereignTokenOverrides errors from
  log.Warn (silently swallowed) to returned errors propagated through RunStep0.
- Fix nativeEntry variable shadowing in RunStep0 (:= → =).
- Add unit tests for traceOneTransaction (success, dedup, RPC error, bad JSON,
  null+error) and traceTransactions (continueOnError, abort), plus end-to-end
  TestRunStepA_AbortOnTraceError against a fake HTTP server.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- parseTargetBlock now returns (BlockNumberFinality, error); update
  call sites in config_test.go to handle the error properly
- Use "LatestBlock" instead of "latest" in test config (NewBlockNumberFinality
  requires the canonical name, not the Ethereum RPC tag)
- Update step_a_test.go to compare failed[0].Hash now that the return
  type changed from []common.Hash to []FailedTrace

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lation

Draining in a background goroutine allowed in-flight workers to keep
running after runWorkerPool returned, creating a window where closures
could mutate captured state (e.g. firstTraceErr) in the caller.
Blocking on the synchronous drain guarantees all workers finish before
the caller resumes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…a_test

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@joanestebanr joanestebanr force-pushed the feat/exit_certificate_f09_trace_error branch from 58e010c to aa513a8 Compare May 27, 2026 15:32
@joanestebanr joanestebanr requested a review from Copilot May 27, 2026 15:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the exit certificate Step A failure path by propagating trace errors, canceling worker activity on first trace failure when configured, and adding independent Step A window sizing.

Changes:

  • Adds context-aware worker pool cancellation and RPC retry backoff cancellation.
  • Introduces StepAWindowSize and updates config/examples/docs/tests.
  • Changes failed trace output to include both transaction hash and error, and promotes some Step 0 override errors to returned errors.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tools/exit_certificate/worker.go Adds context cancellation support to the shared worker pool.
tools/exit_certificate/types.go Adds FailedTrace with hash and error fields.
tools/exit_certificate/step_a.go Uses StepAWindowSize and aborts tracing on first failure when configured.
tools/exit_certificate/step_a_test.go Adds trace and Step A failure-path tests.
tools/exit_certificate/step_0.go Returns sovereign override errors and updates worker pool calls.
tools/exit_certificate/step_e.go Updates worker pool call to pass context.
tools/exit_certificate/rpc.go Makes RPC retries/backoff context-aware and updates worker pool call.
tools/exit_certificate/rpc_test.go Updates backoff tests for context cancellation.
tools/exit_certificate/config.go Adds stepAWindowSize config parsing/defaulting.
tools/exit_certificate/config_test.go Adds config tests for Step A window sizing.
tools/exit_certificate/README.md Documents blockRange scope and stepAWindowSize.
tools/exit_certificate/config-examples/zkevm-mainnet.json Adds stepAWindowSize and updates L2 RPC URL.
tools/exit_certificate/config-examples/zkevm-cardona.json Adds stepAWindowSize.

Comment thread tools/exit_certificate/step_a_test.go Outdated
Comment thread tools/exit_certificate/config.go Outdated
Comment thread tools/exit_certificate/step_a.go
…or test

- Update StepAWindowSize doc comment to reflect its independent 5000 default (not BlockRange)
- Update RunStepA1 doc comment to reference Options.StepAWindowSize instead of Options.BlockRange
- Add a second tx and atomic trace counter to TestRunStepA_AbortOnTraceError so the test
  actually verifies that the worker pool stops after the first failure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@joanestebanr joanestebanr requested a review from web3security May 27, 2026 16:11
Copy link
Copy Markdown

@web3security web3security left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation correctly addresses F-09 (silent address omission on trace failure). The worker pool context propagation, abort-on-error path, and FailedTrace diagnostics are logically sound. Static analysis of the full diff shows no issues introduced by these changes.

CI failures — none are blockers introduced by this PR:

  • test-unit (SonarQube HTTP 403): Infrastructure authentication failure on api.sonarcloud.io, not a Go test failure. The exit_certificate package tests pass: ok github.com/agglayer/aggkit/tools/exit_certificate 10.098s coverage: 29.8%.
  • govulncheck: Two vulnerabilities reported — GO-2026-5026 (golang.org/x/net@v0.53.0) and GO-2026-4479 (github.com/pion/dtls/v2@v2.2.12). Neither vulnerability has a call trace through tools/exit_certificate. Both are pre-existing in the repository dependency tree, not introduced by this PR. See separate tracking note below.
  • Multi chains E2E (CDK Erigon, pessimistic, 3 chains): Please confirm this is a pre-existing flaky/infra failure and not a regression from the context cancellation changes in worker.go. The changes are additive (ctx.Done() select guards) and should not affect E2E behavior, but developer confirmation of manual testing on Kurtosis is requested before merge.

Dependency vulnerabilities — track separately:

  • GO-2026-5026: fixable by bumping golang.org/x/net to v0.55.0. Please open a dedicated issue.
  • GO-2026-4479 (pion/dtls/v2): no upstream fix available yet. No call trace through exit_certificate code. Track until a patched version is released.

Fixes vulnerability GO-2026-5026 (Punycode IDNA label rejection failure
in golang.org/x/net/idna). GO-2026-4479 (pion/dtls/v2) has no upstream
fix available yet and remains tracked separately.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

@joanestebanr joanestebanr merged commit 6119eb3 into feature/exit-certificate-tool May 28, 2026
19 of 20 checks passed
@joanestebanr joanestebanr deleted the feat/exit_certificate_f09_trace_error branch May 28, 2026 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

exit_certificate_tool Tool to create a final exit certificate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants