fix(event_loop) + docs(m3): block-only subscriptions + M3 testnet runbook (validated)#32
Open
brunota20 wants to merge 2 commits into
Open
Conversation
Surfaced wiring up `engine.m3.toml` for the M3 testnet runbook: all
3 M3 example modules (price-alert, balance-tracker, stop-loss) only
declare `[[subscription]] kind = "block"`, leaving `log_streams`
empty. `select_all` over an empty Vec yields `None` immediately, the
`tokio::select!` arm fired, and the loop hit the
"log stream ended - shutting down for restart" bail before any block
flowed. The engine bailed within ~50 ms of `supervisor ready`.
Fix: replace each empty side with `futures::stream::pending()` so
the corresponding select arm is never selected. The bail-on-None
semantic still fires when a *non-empty* stream actually closes
(real WebSocket drop), which is the original intent.
The bug was symmetric (log-only configs would also bail) but only
the block-only path is exercised by an existing module config. M2
was unaffected because both modules subscribe to at least one log.
Regression test in `supervisor::tests::
run_does_not_bail_when_both_stream_kinds_are_empty`: invokes `run`
with two empty `Vec`s plus a 50 ms shutdown timer; asserts `run`
blocks the full 50 ms instead of returning at 0 ms. The pre-fix
binary returns in <5 ms.
Verified locally:
cargo test -p nexum-engine -> 47 passed (was 46)
just run-m3 -> 3 modules boot;
first block dispatch
fires all 3 strategy
paths against live
Sepolia (oracle read,
balance polls, cow-api
submit + retry
classification)
… 3-module E2E)
Sister doc to `docs/operations/m2-testnet-runbook.md`. Same shape,
different modules. Closes the gap "M3 is unit + integration tested
but has never been exercised against a real chain", same as the M2
runbook closed for M2.
## New files
- `engine.m3.toml` - workspace-root engine config that boots the 3
M3 example modules (price-alert + balance-tracker + stop-loss)
against Sepolia public WS. Separate `state_dir = "./data/m3"` so
it never collides with M1 / M2 runbook state.
- `docs/operations/m3-testnet-runbook.md` - operator runbook
mirroring the M2 one: prerequisites, smoke+active run (M3 is
active by default since the example modules trigger on every
block), optional pre-signature setup for real stop-loss
settlement, state inspection, scope boundaries, troubleshooting,
references.
- `justfile` recipes: `build-m3` + `run-m3`.
## Validated locally
A single Sepolia block dispatch (~10 s wall clock) drove all 3 M3
strategy paths through the live testnet:
- **price-alert**: `chain::request eth_call` -> Chainlink
AggregatorV3Interface -> ABI decode -> `TRIGGERED answer=
174553978080 threshold=250000000000 (Below)` (Sepolia ETH/USD
feed reports $1745.54, below the $2500 default threshold).
- **balance-tracker**: 2 `chain::request eth_getBalance` calls
(one per configured address) - SDK chain helper + multi-key
local-store path.
- **stop-loss**: `eth_call` oracle -> `from_signed_order_data`
`OrderCreation` with `Signature::PreSign` -> `cow-api::submit-
order` bytes=561 -> orderbook returns typed
`TransferSimulationFailed` -> `classify_api_error` tags as
retriable -> `retry on next block`. Full submit path
confirmed; the orderbook rejection is the typed-retry
contract working as designed (the default config's
`owner = 0x70997970...` does not hold the sell token on
Sepolia, so simulation correctly fails).
This validates everything the SDK BLEU-840 / BLEU-841 / BLEU-851 /
-852 / -854 / -855 PR series builds: Host trait surface, chain
helpers, cow helpers, MockHost recipe, strategy/lib split. The
same code paths that pass 145 unit tests + 6 doctests + 5
supervisor integration tests now also work against live Sepolia.
## What this validates that the M2 runbook does not
M2 only exercises the orderbook submit path indirectly (through
the EthFlow watcher reacting to swap.cow.fi traffic, and only when
app_data is empty - documented limitation). M3 stop-loss submits
proactively on every poll, so the orderbook always sees a real
`OrderCreation` body even if it rejects. The typed-retry SDK
contract (`classify_api_error` mapping `TransferSimulationFailed`
-> `RetryAction::TryNextBlock`) is exercised end-to-end with a
real orderbook response, not a fixture.
## Stacks on
- `fix(event_loop)` commit immediately preceding this one - the
bug surfaced wiring up `engine.m3.toml` (block-only subscriptions
bailed the engine pre-fix).
- PR #31 (M2 runbook) - same operator-doc shape, same conventions.
This was referenced Jun 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Two coupled commits:
`fix(event_loop)` - engines whose modules only declare `[[subscription]] kind = "block"` (or only `kind = "log"`) no longer bail at boot. `select_all` on an empty Vec yielded `None` immediately, tripping the "stream ended → shut down" arm before any event flowed. Replaced empty input with `stream::pending()` so the arm is never selected.
`docs(m3)` - M3 testnet runbook + `engine.m3.toml` + `just run-m3`, sister doc to PR docs(m2): testnet runbook + engine.m2.toml + just run-m2 (validated boot) #31 (M2 runbook). All 3 M3 example modules (price-alert + balance-tracker + stop-loss) boot against Sepolia and exercise their full strategy path on the first block dispatch.
Why
The M3 milestone was unit + integration tested (145 host tests + 6 doctests + 5 supervisor integration tests) but had never been exercised against a real chain. Wiring up `engine.m3.toml` to do that surfaced the event_loop bug - all 3 M3 modules are block-only, which is a config shape M1 had never exercised.
Validated locally on Sepolia
A single block dispatch (~10s wall clock) drove all 3 strategy paths:
The stop-loss rejection is the SDK retry contract working: the default config's owner does not hold the sell token, so the orderbook simulation fails; the SDK's `classify_api_error` maps that to retriable; the watch is preserved for the next block.
Changes
Breaking changes
None. The fix preserves the bail-on-None semantic for non-empty streams; only the empty-Vec edge case changed.
Testing
AI assistance disclosure
AI Assistance: this fix + docs + description was produced by a Claude Code agent (Claude Opus 4.7 1M context). A human (Bruno) reviewed and is accountable for the result. The Sepolia boot validation was run by the agent.
Linear: no dedicated issue - this is the M3 runbook counterpart to the M2 runbook in #31. The fix is incidental to wiring up the runbook.
Stacks on #31 (M2 runbook) → #30 (COW-1068) → #29 (COW-1067) → #28 (COW-1069) → #27 (COW-1066) → #26 (COW-1063 QA cleanup).