feat(load-test): Anvil fork + mock orderbook + load-gen (COW-1079) by brunota20 · Pull Request #52 · bleu/nullis-shepherd

brunota20 · 2026-06-19T14:11:10Z

Summary

Synthetic load test for shepherd's M4 stack, distinct from the existing test surface (COW-1064 correctness, COW-1078 backtest, COW-1031 soak). Answers the throughput question lgahdl's PR #9 review thread flagged about sequential per-module dispatch.

See docs/operations/load-testnet-runbook.md for the operator flow and acceptance bars per scenario (baseline 5x5, medium 20x20, saturation 50x50).

What is included

File	Purpose
`tools/orderbook-mock/`	Axum HTTP server: `POST /api/v1/orders`, `GET /api/v1/app_data/{hash}`, `/healthz`, `/_stats`. Knobs: `--port`, `--latency-ms`, `--error-rate` (cycles `InsufficientFee` / `InvalidSignature` so the strategy's `TryNextBlock` + `Drop` arms both fire). 3 unit tests.
`tools/load-gen/`	Alloy binary: connects to Anvil, impersonates the pinned EOA via `anvil_impersonateAccount` + `anvil_setBalance`, fires N `ComposableCoW.create` + M `CoWSwapEthFlow.createOrder` per new block. 3 unit tests (pinned address parsing, salt uniqueness, calldata selector shape).
`crates/nexum-engine/src/engine_config.rs`	`ChainConfig` gains optional `orderbook_url` for per-chain base-URL overrides.
`crates/nexum-engine/src/host/cow_orderbook.rs`	New `OrderBookPool::from_config(&EngineConfig)` honours the override via `cowprotocol::OrderBookApi::new_with_base_url`; absent overrides fall back to canonical `api.cow.fi` URLs.
`crates/nexum-engine/src/main.rs`	Switches `OrderBookPool::default()` -> `from_config(&engine_cfg)`.
`engine.load.toml`	Engine config: chain 11155111 -> `ws://localhost:8545`, cow base URL -> `http://localhost:9999`, Prometheus on `127.0.0.1:9100`, `state_dir = ./data/load` (wiped per run).
`scripts/load-bootstrap.sh`	Brings up Anvil fork + orderbook-mock, tracks PIDs in `/tmp/shepherd-load.pids`.
`scripts/load-teardown.sh`	Idempotent cleanup.
`scripts/load-run.sh`	One scenario end-to-end: bootstrap, build, start engine, snapshot `/metrics`, run load-gen, snapshot `/metrics`, teardown, drop a report skeleton at `docs/operations/load-reports/load-NxM-YYYY-MM-DD.md`.
`docs/operations/load-testnet-runbook.md`	Operator runbook + acceptance bars + what the test does NOT prove.

Validation

Check	Result
`cargo test --workspace --exclude <wasm-only-modules>`	196 passed
`cargo clippy --workspace --all-targets --tests -- -D warnings`	clean
`cargo fmt --all --check`	clean
`bash -n scripts/load-{bootstrap,run,teardown}.sh`	clean
Live `orderbook-mock` smoke: POST -> valid 56-byte UID, GET -> `{"fullAppData":"{}"}`, `/_stats` -> counters tracked	yes

Stack

feat/load-test-anvil-cow-1079 -> chore/hex-via-alloy-mfw78-followup (PR #51) -> fix/twap-calldata-helper-cow-1077 (PR #50) -> feat/ethflow-expected-excessive-validto-cow-1076 (PR #49) -> feat/forward-orderbook-error-cow-1075 (PR #48) -> feat/resolve-app-data-cow-1074 (PR #47).

Pending (not in this PR)

Baseline 5x5 report against a live Anvil fork - requires Bruno's RPC_URL_SEPOLIA_HTTP from scripts/.env. Once the run lands, the report goes under docs/operations/load-reports/ and signs off the baseline acceptance bar from COW-1079.
Metrics-delta auto-generation in scripts/load-run.sh (left as TBD in the script; e2e-report-gen.sh has the delta logic we can adapt).
Saturation 50x50 scenario - run after baseline + medium so the bottleneck has a clean baseline to compare against.

What this test does NOT prove

Repeated from the runbook so reviewers see it inline: WS reconnect resilience (COW-1031), real-orderbook 4xx variety (COW-1078), multi-day memory drift (COW-1031), diverse appData shapes (COW-1078). This test answers exactly one question: how many TWAP+EthFlow events per block can shepherd dispatch before something breaks?

Linear

In Progress on COW-1079. The Backlog issue will close out once the three scenario reports (baseline, medium, saturation) all land.

AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).

Synthetic load test for shepherd's M4 stack. Distinct from: - COW-1064 (real Sepolia E2E, correctness, 90 min, 5 modules) - COW-1078 (backtest of 7d historical events, replay) - COW-1031 (7-day soak, wall-clock stability) This issue answers one question the others do not: how many events per block can the supervisor dispatch before something breaks? lgahdl's PR #9 review thread flagged sequential per-module dispatch as a potential bottleneck; this PR is how we measure it. Components added: 1. `tools/orderbook-mock` (new crate, axum-based) - HTTP server serving the two endpoints shepherd's cow-api host hits per submission. POST /api/v1/orders returns a synthetic 56-byte OrderUid; GET /api/v1/app_data/{hash} returns the empty appData document. CLI knobs: --port, --latency-ms, --error-rate (alternates InsufficientFee / InvalidSignature to exercise both TryNextBlock and Drop paths). 3 unit tests covering the happy path, the empty appData path, and the error-rate envelope. 2. `tools/load-gen` (new crate, alloy-based) - connects to Anvil, impersonates the pinned Sepolia test EOA via anvil_impersonateAccount + anvil_setBalance, then on every new block fires N ComposableCoW.create(...) + M CoWSwapEthFlow.createOrder(...) calls. Each create uses a fresh salt counter so submissions do not collide on the dedup check. 3 unit tests covering pinned address parsing, salt uniqueness, and calldata selector shape. 3. Engine config: ChainConfig gains optional `orderbook_url` (per chain). OrderBookPool::from_config honours the override using cowprotocol::OrderBookApi::new_with_base_url; absent overrides fall back to canonical api.cow.fi URLs. main.rs switches from ::default() to ::from_config(&engine_cfg). Useful long-term for staging/barn targets, immediately needed to point at the mock. 4. `engine.load.toml` - chain 11155111 -> ws://localhost:8545, cow base URL -> http://localhost:9999, metrics on 127.0.0.1:9100, state_dir = ./data/load (wiped per run). 5. Scripts: - `scripts/load-bootstrap.sh` brings up Anvil + orderbook-mock, tracks PIDs in /tmp/shepherd-load.pids, exposes a teardown helper. - `scripts/load-teardown.sh` idempotent cleanup. - `scripts/load-run.sh` orchestrates one scenario end-to-end: bootstrap, build modules, start engine, snapshot /metrics, run load-gen for --duration-min, snapshot /metrics again, tear down, drop a report skeleton at docs/operations/load-reports/load-NxM-YYYY-MM-DD.md. 6. `docs/operations/load-testnet-runbook.md` - operator runbook covering the three scenarios (baseline 5x5, medium 20x20, saturation 50x50), expected acceptance bars, what the test does NOT prove (WS reconnect / drift / real-orderbook fidelity), troubleshooting. Validation: - cargo test --workspace --exclude <wasm-only-modules>: 196 passed. - cargo clippy --workspace --all-targets --tests -- -D warnings: clean. - cargo fmt --all --check: clean. - bash -n scripts/load-{bootstrap,run,teardown}.sh: clean. - Live orderbook-mock smoke: POST returns valid 56-byte hex UID, GET returns {"fullAppData":"{}"}, /_stats reflects counters. Pending (not in this PR): - Baseline 5x5 report against a real Anvil fork - requires Bruno's RPC_URL_SEPOLIA_HTTP from scripts/.env; once that runs, the report lands in docs/operations/load-reports/. - Metrics-delta auto-generation in scripts/load-run.sh (left as TBD in the script; e2e-report-gen.sh has the delta logic we can adapt). - Saturation scenario - run after the baseline lands so the bottleneck has a clean baseline to compare against. AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).

linear-code · 2026-06-19T14:11:14Z

COW-1079

…tion (COW-1079) First COW-1079 run on a real Anvil fork of Sepolia. The engine-side acceptance bar is cleared with wide margin: - Per-block dispatch latency p50/p95/p99 = 4/6/7 ms (bar was < 2 s). - Zero traps, zero poisoned modules, zero shepherd_module_errors_total. - EthFlow strategy submitted 1 OrderPlacement end-to-end through the mock orderbook in 10 ms; submitted:{uid} marker written cleanly. - 63 Anvil blocks dispatched flawlessly. The honest finding: load-gen's transactions get into Anvil's mempool (twap_ok=270, ethflow_ok=270 per the eth_sendTransaction response), but only 5 ConditionalOrderCreated + 1 OrderPlacement events actually fired - the rest reverted at the contract level (ComposableCoW.create + EthFlow.createOrder run preconditions the load-gen-crafted bodies don't pass). So this run stressed the engine with ~6 events over 60 s, not 5+5 per block. The bar criterion that depends on the load-gen (events-per-block delivered) is the only one that doesn't pass; filing a follow-up to calibrate the revert rate before re-running. Report at docs/operations/load-reports/load-5x5-2026-06-19.md mirrors the COW-1064 e2e-report shape and signs off as "conditional pass" - engine meets the bar; load-gen needs work. AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).

scripts/lib.sh exports REPORTS_DIR=e2e-reports/ unconditionally. load-run.sh used to set REPORTS_DIR=load-reports/ BEFORE sourcing load-bootstrap.sh (which transitively sources lib.sh), so the override was lost and the auto-generated skeleton ended up under e2e-reports/ next to the COW-1064 reports. Move the assignment after the source so the load-reports/ path wins, with a comment explaining the ordering trap. Drive-by: removed the misplaced e2e-reports/load-5x5-2026-06-19.md from the first run; the committed report at load-reports/load-5x5-2026-06-19.md (commit 59fe714) is the canonical copy. AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).

brunota20 added 2 commits June 19, 2026 11:32

brunota20 mentioned this pull request Jun 19, 2026

fix(load-gen): explicit nonce + unique EthFlow sellAmount (COW-1080) #53

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(load-test): Anvil fork + mock orderbook + load-gen (COW-1079)#52

feat(load-test): Anvil fork + mock orderbook + load-gen (COW-1079)#52
brunota20 wants to merge 3 commits into
chore/hex-via-alloy-mfw78-followupfrom
feat/load-test-anvil-cow-1079

brunota20 commented Jun 19, 2026

Uh oh!

linear-code Bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brunota20 commented Jun 19, 2026

Summary

What is included

Validation

Stack

Pending (not in this PR)

What this test does NOT prove

Linear

Uh oh!

linear-code Bot commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant