Skip to content

feat(load-test): Anvil fork + mock orderbook + load-gen (COW-1079)#52

Open
brunota20 wants to merge 3 commits into
chore/hex-via-alloy-mfw78-followupfrom
feat/load-test-anvil-cow-1079
Open

feat(load-test): Anvil fork + mock orderbook + load-gen (COW-1079)#52
brunota20 wants to merge 3 commits into
chore/hex-via-alloy-mfw78-followupfrom
feat/load-test-anvil-cow-1079

Conversation

@brunota20

Copy link
Copy Markdown
Collaborator

Summary

Synthetic load test for shepherd's M4 stack, distinct from the existing test surface (COW-1064 correctness, COW-1078 backtest, COW-1031 soak). Answers the throughput question lgahdl's PR #9 review thread flagged about sequential per-module dispatch.

See docs/operations/load-testnet-runbook.md for the operator flow and acceptance bars per scenario (baseline 5x5, medium 20x20, saturation 50x50).

What is included

File Purpose
tools/orderbook-mock/ Axum HTTP server: POST /api/v1/orders, GET /api/v1/app_data/{hash}, /healthz, /_stats. Knobs: --port, --latency-ms, --error-rate (cycles InsufficientFee / InvalidSignature so the strategy's TryNextBlock + Drop arms both fire). 3 unit tests.
tools/load-gen/ Alloy binary: connects to Anvil, impersonates the pinned EOA via anvil_impersonateAccount + anvil_setBalance, fires N ComposableCoW.create + M CoWSwapEthFlow.createOrder per new block. 3 unit tests (pinned address parsing, salt uniqueness, calldata selector shape).
crates/nexum-engine/src/engine_config.rs ChainConfig gains optional orderbook_url for per-chain base-URL overrides.
crates/nexum-engine/src/host/cow_orderbook.rs New OrderBookPool::from_config(&EngineConfig) honours the override via cowprotocol::OrderBookApi::new_with_base_url; absent overrides fall back to canonical api.cow.fi URLs.
crates/nexum-engine/src/main.rs Switches OrderBookPool::default() -> from_config(&engine_cfg).
engine.load.toml Engine config: chain 11155111 -> ws://localhost:8545, cow base URL -> http://localhost:9999, Prometheus on 127.0.0.1:9100, state_dir = ./data/load (wiped per run).
scripts/load-bootstrap.sh Brings up Anvil fork + orderbook-mock, tracks PIDs in /tmp/shepherd-load.pids.
scripts/load-teardown.sh Idempotent cleanup.
scripts/load-run.sh One scenario end-to-end: bootstrap, build, start engine, snapshot /metrics, run load-gen, snapshot /metrics, teardown, drop a report skeleton at docs/operations/load-reports/load-NxM-YYYY-MM-DD.md.
docs/operations/load-testnet-runbook.md Operator runbook + acceptance bars + what the test does NOT prove.

Validation

Check Result
cargo test --workspace --exclude <wasm-only-modules> 196 passed
cargo clippy --workspace --all-targets --tests -- -D warnings clean
cargo fmt --all --check clean
bash -n scripts/load-{bootstrap,run,teardown}.sh clean
Live orderbook-mock smoke: POST -> valid 56-byte UID, GET -> {"fullAppData":"{}"}, /_stats -> counters tracked yes

Stack

feat/load-test-anvil-cow-1079 -> chore/hex-via-alloy-mfw78-followup (PR #51) -> fix/twap-calldata-helper-cow-1077 (PR #50) -> feat/ethflow-expected-excessive-validto-cow-1076 (PR #49) -> feat/forward-orderbook-error-cow-1075 (PR #48) -> feat/resolve-app-data-cow-1074 (PR #47).

Pending (not in this PR)

  • Baseline 5x5 report against a live Anvil fork - requires Bruno's RPC_URL_SEPOLIA_HTTP from scripts/.env. Once the run lands, the report goes under docs/operations/load-reports/ and signs off the baseline acceptance bar from COW-1079.
  • Metrics-delta auto-generation in scripts/load-run.sh (left as TBD in the script; e2e-report-gen.sh has the delta logic we can adapt).
  • Saturation 50x50 scenario - run after baseline + medium so the bottleneck has a clean baseline to compare against.

What this test does NOT prove

Repeated from the runbook so reviewers see it inline: WS reconnect resilience (COW-1031), real-orderbook 4xx variety (COW-1078), multi-day memory drift (COW-1031), diverse appData shapes (COW-1078). This test answers exactly one question: how many TWAP+EthFlow events per block can shepherd dispatch before something breaks?

Linear

In Progress on COW-1079. The Backlog issue will close out once the three scenario reports (baseline, medium, saturation) all land.

AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).

Synthetic load test for shepherd's M4 stack. Distinct from:
- COW-1064 (real Sepolia E2E, correctness, 90 min, 5 modules)
- COW-1078 (backtest of 7d historical events, replay)
- COW-1031 (7-day soak, wall-clock stability)

This issue answers one question the others do not: how many events
per block can the supervisor dispatch before something breaks?
lgahdl's PR #9 review thread flagged sequential per-module dispatch
as a potential bottleneck; this PR is how we measure it.

Components added:

1. `tools/orderbook-mock` (new crate, axum-based) - HTTP server
   serving the two endpoints shepherd's cow-api host hits per
   submission. POST /api/v1/orders returns a synthetic 56-byte
   OrderUid; GET /api/v1/app_data/{hash} returns the empty appData
   document. CLI knobs: --port, --latency-ms, --error-rate (alternates
   InsufficientFee / InvalidSignature to exercise both TryNextBlock
   and Drop paths). 3 unit tests covering the happy path, the empty
   appData path, and the error-rate envelope.

2. `tools/load-gen` (new crate, alloy-based) - connects to Anvil,
   impersonates the pinned Sepolia test EOA via
   anvil_impersonateAccount + anvil_setBalance, then on every new
   block fires N ComposableCoW.create(...) + M
   CoWSwapEthFlow.createOrder(...) calls. Each create uses a fresh
   salt counter so submissions do not collide on the dedup check.
   3 unit tests covering pinned address parsing, salt uniqueness, and
   calldata selector shape.

3. Engine config: ChainConfig gains optional `orderbook_url` (per
   chain). OrderBookPool::from_config honours the override using
   cowprotocol::OrderBookApi::new_with_base_url; absent overrides
   fall back to canonical api.cow.fi URLs. main.rs switches from
   ::default() to ::from_config(&engine_cfg). Useful long-term for
   staging/barn targets, immediately needed to point at the mock.

4. `engine.load.toml` - chain 11155111 -> ws://localhost:8545, cow
   base URL -> http://localhost:9999, metrics on 127.0.0.1:9100,
   state_dir = ./data/load (wiped per run).

5. Scripts:
   - `scripts/load-bootstrap.sh` brings up Anvil + orderbook-mock,
     tracks PIDs in /tmp/shepherd-load.pids, exposes a teardown
     helper.
   - `scripts/load-teardown.sh` idempotent cleanup.
   - `scripts/load-run.sh` orchestrates one scenario end-to-end:
     bootstrap, build modules, start engine, snapshot /metrics,
     run load-gen for --duration-min, snapshot /metrics again,
     tear down, drop a report skeleton at
     docs/operations/load-reports/load-NxM-YYYY-MM-DD.md.

6. `docs/operations/load-testnet-runbook.md` - operator runbook
   covering the three scenarios (baseline 5x5, medium 20x20,
   saturation 50x50), expected acceptance bars, what the test
   does NOT prove (WS reconnect / drift / real-orderbook fidelity),
   troubleshooting.

Validation:
- cargo test --workspace --exclude <wasm-only-modules>: 196 passed.
- cargo clippy --workspace --all-targets --tests -- -D warnings: clean.
- cargo fmt --all --check: clean.
- bash -n scripts/load-{bootstrap,run,teardown}.sh: clean.
- Live orderbook-mock smoke: POST returns valid 56-byte hex UID, GET
  returns {"fullAppData":"{}"}, /_stats reflects counters.

Pending (not in this PR):
- Baseline 5x5 report against a real Anvil fork - requires Bruno's
  RPC_URL_SEPOLIA_HTTP from scripts/.env; once that runs, the report
  lands in docs/operations/load-reports/.
- Metrics-delta auto-generation in scripts/load-run.sh (left as TBD
  in the script; e2e-report-gen.sh has the delta logic we can adapt).
- Saturation scenario - run after the baseline lands so the
  bottleneck has a clean baseline to compare against.

AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).
@linear-code

linear-code Bot commented Jun 19, 2026

Copy link
Copy Markdown

COW-1079

…tion (COW-1079)

First COW-1079 run on a real Anvil fork of Sepolia. The engine-side
acceptance bar is cleared with wide margin:

- Per-block dispatch latency p50/p95/p99 = 4/6/7 ms (bar was < 2 s).
- Zero traps, zero poisoned modules, zero shepherd_module_errors_total.
- EthFlow strategy submitted 1 OrderPlacement end-to-end through the
  mock orderbook in 10 ms; submitted:{uid} marker written cleanly.
- 63 Anvil blocks dispatched flawlessly.

The honest finding: load-gen's transactions get into Anvil's mempool
(twap_ok=270, ethflow_ok=270 per the eth_sendTransaction response),
but only 5 ConditionalOrderCreated + 1 OrderPlacement events
actually fired - the rest reverted at the contract level
(ComposableCoW.create + EthFlow.createOrder run preconditions the
load-gen-crafted bodies don't pass).

So this run stressed the engine with ~6 events over 60 s, not
5+5 per block. The bar criterion that depends on the load-gen
(events-per-block delivered) is the only one that doesn't pass;
filing a follow-up to calibrate the revert rate before re-running.

Report at docs/operations/load-reports/load-5x5-2026-06-19.md
mirrors the COW-1064 e2e-report shape and signs off as
"conditional pass" - engine meets the bar; load-gen needs work.

AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).
scripts/lib.sh exports REPORTS_DIR=e2e-reports/ unconditionally.
load-run.sh used to set REPORTS_DIR=load-reports/ BEFORE sourcing
load-bootstrap.sh (which transitively sources lib.sh), so the
override was lost and the auto-generated skeleton ended up under
e2e-reports/ next to the COW-1064 reports.

Move the assignment after the source so the load-reports/ path
wins, with a comment explaining the ordering trap.

Drive-by: removed the misplaced e2e-reports/load-5x5-2026-06-19.md
from the first run; the committed report at
load-reports/load-5x5-2026-06-19.md (commit 59fe714) is the
canonical copy.

AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant