feat(load-test): Anvil fork + mock orderbook + load-gen (COW-1079)#52
Open
brunota20 wants to merge 3 commits into
Open
feat(load-test): Anvil fork + mock orderbook + load-gen (COW-1079)#52brunota20 wants to merge 3 commits into
brunota20 wants to merge 3 commits into
Conversation
Synthetic load test for shepherd's M4 stack. Distinct from: - COW-1064 (real Sepolia E2E, correctness, 90 min, 5 modules) - COW-1078 (backtest of 7d historical events, replay) - COW-1031 (7-day soak, wall-clock stability) This issue answers one question the others do not: how many events per block can the supervisor dispatch before something breaks? lgahdl's PR #9 review thread flagged sequential per-module dispatch as a potential bottleneck; this PR is how we measure it. Components added: 1. `tools/orderbook-mock` (new crate, axum-based) - HTTP server serving the two endpoints shepherd's cow-api host hits per submission. POST /api/v1/orders returns a synthetic 56-byte OrderUid; GET /api/v1/app_data/{hash} returns the empty appData document. CLI knobs: --port, --latency-ms, --error-rate (alternates InsufficientFee / InvalidSignature to exercise both TryNextBlock and Drop paths). 3 unit tests covering the happy path, the empty appData path, and the error-rate envelope. 2. `tools/load-gen` (new crate, alloy-based) - connects to Anvil, impersonates the pinned Sepolia test EOA via anvil_impersonateAccount + anvil_setBalance, then on every new block fires N ComposableCoW.create(...) + M CoWSwapEthFlow.createOrder(...) calls. Each create uses a fresh salt counter so submissions do not collide on the dedup check. 3 unit tests covering pinned address parsing, salt uniqueness, and calldata selector shape. 3. Engine config: ChainConfig gains optional `orderbook_url` (per chain). OrderBookPool::from_config honours the override using cowprotocol::OrderBookApi::new_with_base_url; absent overrides fall back to canonical api.cow.fi URLs. main.rs switches from ::default() to ::from_config(&engine_cfg). Useful long-term for staging/barn targets, immediately needed to point at the mock. 4. `engine.load.toml` - chain 11155111 -> ws://localhost:8545, cow base URL -> http://localhost:9999, metrics on 127.0.0.1:9100, state_dir = ./data/load (wiped per run). 5. Scripts: - `scripts/load-bootstrap.sh` brings up Anvil + orderbook-mock, tracks PIDs in /tmp/shepherd-load.pids, exposes a teardown helper. - `scripts/load-teardown.sh` idempotent cleanup. - `scripts/load-run.sh` orchestrates one scenario end-to-end: bootstrap, build modules, start engine, snapshot /metrics, run load-gen for --duration-min, snapshot /metrics again, tear down, drop a report skeleton at docs/operations/load-reports/load-NxM-YYYY-MM-DD.md. 6. `docs/operations/load-testnet-runbook.md` - operator runbook covering the three scenarios (baseline 5x5, medium 20x20, saturation 50x50), expected acceptance bars, what the test does NOT prove (WS reconnect / drift / real-orderbook fidelity), troubleshooting. Validation: - cargo test --workspace --exclude <wasm-only-modules>: 196 passed. - cargo clippy --workspace --all-targets --tests -- -D warnings: clean. - cargo fmt --all --check: clean. - bash -n scripts/load-{bootstrap,run,teardown}.sh: clean. - Live orderbook-mock smoke: POST returns valid 56-byte hex UID, GET returns {"fullAppData":"{}"}, /_stats reflects counters. Pending (not in this PR): - Baseline 5x5 report against a real Anvil fork - requires Bruno's RPC_URL_SEPOLIA_HTTP from scripts/.env; once that runs, the report lands in docs/operations/load-reports/. - Metrics-delta auto-generation in scripts/load-run.sh (left as TBD in the script; e2e-report-gen.sh has the delta logic we can adapt). - Saturation scenario - run after the baseline lands so the bottleneck has a clean baseline to compare against. AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).
…tion (COW-1079)
First COW-1079 run on a real Anvil fork of Sepolia. The engine-side
acceptance bar is cleared with wide margin:
- Per-block dispatch latency p50/p95/p99 = 4/6/7 ms (bar was < 2 s).
- Zero traps, zero poisoned modules, zero shepherd_module_errors_total.
- EthFlow strategy submitted 1 OrderPlacement end-to-end through the
mock orderbook in 10 ms; submitted:{uid} marker written cleanly.
- 63 Anvil blocks dispatched flawlessly.
The honest finding: load-gen's transactions get into Anvil's mempool
(twap_ok=270, ethflow_ok=270 per the eth_sendTransaction response),
but only 5 ConditionalOrderCreated + 1 OrderPlacement events
actually fired - the rest reverted at the contract level
(ComposableCoW.create + EthFlow.createOrder run preconditions the
load-gen-crafted bodies don't pass).
So this run stressed the engine with ~6 events over 60 s, not
5+5 per block. The bar criterion that depends on the load-gen
(events-per-block delivered) is the only one that doesn't pass;
filing a follow-up to calibrate the revert rate before re-running.
Report at docs/operations/load-reports/load-5x5-2026-06-19.md
mirrors the COW-1064 e2e-report shape and signs off as
"conditional pass" - engine meets the bar; load-gen needs work.
AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).
scripts/lib.sh exports REPORTS_DIR=e2e-reports/ unconditionally. load-run.sh used to set REPORTS_DIR=load-reports/ BEFORE sourcing load-bootstrap.sh (which transitively sources lib.sh), so the override was lost and the auto-generated skeleton ended up under e2e-reports/ next to the COW-1064 reports. Move the assignment after the source so the load-reports/ path wins, with a comment explaining the ordering trap. Drive-by: removed the misplaced e2e-reports/load-5x5-2026-06-19.md from the first run; the committed report at load-reports/load-5x5-2026-06-19.md (commit 59fe714) is the canonical copy. AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Synthetic load test for shepherd's M4 stack, distinct from the existing test surface (COW-1064 correctness, COW-1078 backtest, COW-1031 soak). Answers the throughput question lgahdl's PR #9 review thread flagged about sequential per-module dispatch.
See
docs/operations/load-testnet-runbook.mdfor the operator flow and acceptance bars per scenario (baseline 5x5, medium 20x20, saturation 50x50).What is included
tools/orderbook-mock/POST /api/v1/orders,GET /api/v1/app_data/{hash},/healthz,/_stats. Knobs:--port,--latency-ms,--error-rate(cyclesInsufficientFee/InvalidSignatureso the strategy'sTryNextBlock+Droparms both fire). 3 unit tests.tools/load-gen/anvil_impersonateAccount+anvil_setBalance, fires NComposableCoW.create+ MCoWSwapEthFlow.createOrderper new block. 3 unit tests (pinned address parsing, salt uniqueness, calldata selector shape).crates/nexum-engine/src/engine_config.rsChainConfiggains optionalorderbook_urlfor per-chain base-URL overrides.crates/nexum-engine/src/host/cow_orderbook.rsOrderBookPool::from_config(&EngineConfig)honours the override viacowprotocol::OrderBookApi::new_with_base_url; absent overrides fall back to canonicalapi.cow.fiURLs.crates/nexum-engine/src/main.rsOrderBookPool::default()->from_config(&engine_cfg).engine.load.tomlws://localhost:8545, cow base URL ->http://localhost:9999, Prometheus on127.0.0.1:9100,state_dir = ./data/load(wiped per run).scripts/load-bootstrap.sh/tmp/shepherd-load.pids.scripts/load-teardown.shscripts/load-run.sh/metrics, run load-gen, snapshot/metrics, teardown, drop a report skeleton atdocs/operations/load-reports/load-NxM-YYYY-MM-DD.md.docs/operations/load-testnet-runbook.mdValidation
cargo test --workspace --exclude <wasm-only-modules>cargo clippy --workspace --all-targets --tests -- -D warningscargo fmt --all --checkbash -n scripts/load-{bootstrap,run,teardown}.shorderbook-mocksmoke: POST -> valid 56-byte UID, GET ->{"fullAppData":"{}"},/_stats-> counters trackedStack
feat/load-test-anvil-cow-1079->chore/hex-via-alloy-mfw78-followup(PR #51) ->fix/twap-calldata-helper-cow-1077(PR #50) ->feat/ethflow-expected-excessive-validto-cow-1076(PR #49) ->feat/forward-orderbook-error-cow-1075(PR #48) ->feat/resolve-app-data-cow-1074(PR #47).Pending (not in this PR)
RPC_URL_SEPOLIA_HTTPfromscripts/.env. Once the run lands, the report goes underdocs/operations/load-reports/and signs off the baseline acceptance bar from COW-1079.scripts/load-run.sh(left as TBD in the script;e2e-report-gen.shhas the delta logic we can adapt).What this test does NOT prove
Repeated from the runbook so reviewers see it inline: WS reconnect resilience (COW-1031), real-orderbook 4xx variety (COW-1078), multi-day memory drift (COW-1031), diverse appData shapes (COW-1078). This test answers exactly one question: how many TWAP+EthFlow events per block can shepherd dispatch before something breaks?
Linear
In Progress on COW-1079. The Backlog issue will close out once the three scenario reports (baseline, medium, saturation) all land.
AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).