Skip to content

fix(event_loop) + docs(m3): block-only subscriptions + M3 testnet runbook (validated)#32

Open
brunota20 wants to merge 2 commits into
feat/m2-runbook-and-smoke-configfrom
feat/m3-runbook-and-smoke-config
Open

fix(event_loop) + docs(m3): block-only subscriptions + M3 testnet runbook (validated)#32
brunota20 wants to merge 2 commits into
feat/m2-runbook-and-smoke-configfrom
feat/m3-runbook-and-smoke-config

Conversation

@brunota20

Copy link
Copy Markdown
Collaborator

What does this PR do?

Two coupled commits:

  1. `fix(event_loop)` - engines whose modules only declare `[[subscription]] kind = "block"` (or only `kind = "log"`) no longer bail at boot. `select_all` on an empty Vec yielded `None` immediately, tripping the "stream ended → shut down" arm before any event flowed. Replaced empty input with `stream::pending()` so the arm is never selected.

  2. `docs(m3)` - M3 testnet runbook + `engine.m3.toml` + `just run-m3`, sister doc to PR docs(m2): testnet runbook + engine.m2.toml + just run-m2 (validated boot) #31 (M2 runbook). All 3 M3 example modules (price-alert + balance-tracker + stop-loss) boot against Sepolia and exercise their full strategy path on the first block dispatch.

Why

The M3 milestone was unit + integration tested (145 host tests + 6 doctests + 5 supervisor integration tests) but had never been exercised against a real chain. Wiring up `engine.m3.toml` to do that surfaced the event_loop bug - all 3 M3 modules are block-only, which is a config shape M1 had never exercised.

Validated locally on Sepolia

A single block dispatch (~10s wall clock) drove all 3 strategy paths:

Module Observed
price-alert `TRIGGERED answer=174553978080 threshold=250000000000 (Below)` — Sepolia ETH/USD Chainlink @ $1745.54 < $2500 trigger
balance-tracker 2 `eth_getBalance` calls (one per configured address), multi-key local-store path
stop-loss `eth_call` oracle → `OrderCreation::from_signed_order_data` with `Signature::PreSign` → `cow-api::submit-order` 561B → orderbook returns typed `TransferSimulationFailed` → `classify_api_error` correctly classifies as `RetryAction::TryNextBlock` → `retry on next block (0)`

The stop-loss rejection is the SDK retry contract working: the default config's owner does not hold the sell token, so the orderbook simulation fails; the SDK's `classify_api_error` maps that to retriable; the watch is preserved for the next block.

Changes

File Change
`crates/nexum-engine/src/runtime/event_loop.rs` Replace `select_all(empty)` with `stream::pending()` for each side; cite the original "bail on WS drop" intent so future readers do not regress it
`crates/nexum-engine/src/supervisor/tests.rs` New `run_does_not_bail_when_both_stream_kinds_are_empty` regression test (47 nexum-engine tests, was 46)
`engine.m3.toml` New
`docs/operations/m3-testnet-runbook.md` New
`justfile` New `build-m3` + `run-m3` recipes

Breaking changes

None. The fix preserves the bail-on-None semantic for non-empty streams; only the empty-Vec edge case changed.

Testing

  • `cargo test -p nexum-engine` → 47 passed (was 46).
  • `cargo test --workspace` → 150 host tests + 6 doctests passing.
  • `cargo clippy --all-targets --workspace -- -D warnings` clean.
  • `cargo fmt --all --check` clean.
  • `just run-m3` boots 3 modules + exercises all 3 strategy paths on first Sepolia block.
  • 0 em-dashes in new files.

AI assistance disclosure

AI Assistance: this fix + docs + description was produced by a Claude Code agent (Claude Opus 4.7 1M context). A human (Bruno) reviewed and is accountable for the result. The Sepolia boot validation was run by the agent.

Linear: no dedicated issue - this is the M3 runbook counterpart to the M2 runbook in #31. The fix is incidental to wiring up the runbook.

Stacks on #31 (M2 runbook) → #30 (COW-1068) → #29 (COW-1067) → #28 (COW-1069) → #27 (COW-1066) → #26 (COW-1063 QA cleanup).

Surfaced wiring up `engine.m3.toml` for the M3 testnet runbook: all
3 M3 example modules (price-alert, balance-tracker, stop-loss) only
declare `[[subscription]] kind = "block"`, leaving `log_streams`
empty. `select_all` over an empty Vec yields `None` immediately, the
`tokio::select!` arm fired, and the loop hit the
"log stream ended - shutting down for restart" bail before any block
flowed. The engine bailed within ~50 ms of `supervisor ready`.

Fix: replace each empty side with `futures::stream::pending()` so
the corresponding select arm is never selected. The bail-on-None
semantic still fires when a *non-empty* stream actually closes
(real WebSocket drop), which is the original intent.

The bug was symmetric (log-only configs would also bail) but only
the block-only path is exercised by an existing module config. M2
was unaffected because both modules subscribe to at least one log.

Regression test in `supervisor::tests::
run_does_not_bail_when_both_stream_kinds_are_empty`: invokes `run`
with two empty `Vec`s plus a 50 ms shutdown timer; asserts `run`
blocks the full 50 ms instead of returning at 0 ms. The pre-fix
binary returns in <5 ms.

Verified locally:
  cargo test -p nexum-engine                    -> 47 passed (was 46)
  just run-m3                                    -> 3 modules boot;
                                                    first block dispatch
                                                    fires all 3 strategy
                                                    paths against live
                                                    Sepolia (oracle read,
                                                    balance polls, cow-api
                                                    submit + retry
                                                    classification)
… 3-module E2E)

Sister doc to `docs/operations/m2-testnet-runbook.md`. Same shape,
different modules. Closes the gap "M3 is unit + integration tested
but has never been exercised against a real chain", same as the M2
runbook closed for M2.

## New files

- `engine.m3.toml` - workspace-root engine config that boots the 3
  M3 example modules (price-alert + balance-tracker + stop-loss)
  against Sepolia public WS. Separate `state_dir = "./data/m3"` so
  it never collides with M1 / M2 runbook state.
- `docs/operations/m3-testnet-runbook.md` - operator runbook
  mirroring the M2 one: prerequisites, smoke+active run (M3 is
  active by default since the example modules trigger on every
  block), optional pre-signature setup for real stop-loss
  settlement, state inspection, scope boundaries, troubleshooting,
  references.
- `justfile` recipes: `build-m3` + `run-m3`.

## Validated locally

A single Sepolia block dispatch (~10 s wall clock) drove all 3 M3
strategy paths through the live testnet:

  - **price-alert**: `chain::request eth_call` -> Chainlink
    AggregatorV3Interface -> ABI decode -> `TRIGGERED answer=
    174553978080 threshold=250000000000 (Below)` (Sepolia ETH/USD
    feed reports $1745.54, below the $2500 default threshold).
  - **balance-tracker**: 2 `chain::request eth_getBalance` calls
    (one per configured address) - SDK chain helper + multi-key
    local-store path.
  - **stop-loss**: `eth_call` oracle -> `from_signed_order_data`
    `OrderCreation` with `Signature::PreSign` -> `cow-api::submit-
    order` bytes=561 -> orderbook returns typed
    `TransferSimulationFailed` -> `classify_api_error` tags as
    retriable -> `retry on next block`. Full submit path
    confirmed; the orderbook rejection is the typed-retry
    contract working as designed (the default config's
    `owner = 0x70997970...` does not hold the sell token on
    Sepolia, so simulation correctly fails).

This validates everything the SDK BLEU-840 / BLEU-841 / BLEU-851 /
-852 / -854 / -855 PR series builds: Host trait surface, chain
helpers, cow helpers, MockHost recipe, strategy/lib split. The
same code paths that pass 145 unit tests + 6 doctests + 5
supervisor integration tests now also work against live Sepolia.

## What this validates that the M2 runbook does not

M2 only exercises the orderbook submit path indirectly (through
the EthFlow watcher reacting to swap.cow.fi traffic, and only when
app_data is empty - documented limitation). M3 stop-loss submits
proactively on every poll, so the orderbook always sees a real
`OrderCreation` body even if it rejects. The typed-retry SDK
contract (`classify_api_error` mapping `TransferSimulationFailed`
-> `RetryAction::TryNextBlock`) is exercised end-to-end with a
real orderbook response, not a fixture.

## Stacks on

- `fix(event_loop)` commit immediately preceding this one - the
  bug surfaced wiring up `engine.m3.toml` (block-only subscriptions
  bailed the engine pre-fix).
- PR #31 (M2 runbook) - same operator-doc shape, same conventions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant