Skip to content

M3 epic: SDK + examples + tutorial + QA validation#18

Open
brunota20 wants to merge 68 commits into
nullislabs:mainfrom
bleu:fix/supervisor-alive-on-init-err
Open

M3 epic: SDK + examples + tutorial + QA validation#18
brunota20 wants to merge 68 commits into
nullislabs:mainfrom
bleu:fix/supervisor-alive-on-init-err

Conversation

@brunota20

Copy link
Copy Markdown

M3 epic — consolidated for review

This PR aggregates the M3 deliverable from bleu/nullis-shepherd. M3 ships the layered SDK + developer experience that lets a module author write strategy logic against &impl Host (testable without wasm) and the production wit-bindgen adapter ships as mechanical glue.

Core deliverable

Crate / module What it adds
crates/shepherd-sdk 4 per-capability host traits (ChainHost, LocalStoreHost, CowApiHost, LoggingHost) + supertrait Host; SDK-side HostError mirroring the wit struct; helpers in chain (eth_call_params, parse_eth_call_result, decode_revert_hex) and cow (PollOutcome, RetryAction, classify_api_error, gpv2_to_order_data, decode_revert, IConditionalOrder sol! interface).
crates/shepherd-sdk-test MockHost with per-trait mocks (MockChain, MockLocalStore, MockCowApi, MockLogging) — enables module unit tests that run as native Rust, no wasm toolchain.
modules/examples/price-alert Chainlink oracle reader. Demonstrates chain::request + ABI decode + threshold logic.
modules/examples/balance-tracker ERC-20 balance differ. Demonstrates raw chain::request + per-key local-store persistence.
modules/examples/stop-loss Full M3 surface: oracle read + OrderCreation with Signature::PreSign + cow-api submit + typed retry classification.
docs/tutorial-first-module.md Reads as a guided tour of the real stop-loss module instead of inlined snippets with todo!().
Strategy / lib.rs split M2 modules (twap-monitor, ethflow-watcher) refactored to consume the Host trait pattern + SDK helpers (ADR-0009).

⚠️ Base diff includes M1 + M2

Same caveat as the M2 epic: nullislabs:main is the pre-M1 baseline. This diff includes M1 (your PRs #8/#9/#12/#15) + M2 (our PR #17) + M3. Once M2 epic (#17) merges and your M1 PRs land, this rebases clean to M3-only.

To focus your M3 review, the M3-specific paths are:

  • crates/shepherd-sdk/ and crates/shepherd-sdk-test/
  • modules/examples/ (3 example modules)
  • modules/twap-monitor/src/strategy.rs and modules/ethflow-watcher/src/strategy.rs (refactor to consume SDK)
  • docs/05-sdk-design.md (M3 implementation status callout added at top)
  • docs/adr/0009-host-trait-surface.md (new)
  • docs/operations/m{2,3}-testnet-runbook.md + m3-edge-case-validation.md (new)
  • docs/qa-signoff-cow-1063.md (new, captures the QA pass)
  • Small CI hardening: .github/workflows/ci.yml matrix + rustdoc gate
  • Two supervisor bug fixes the M3 testnet wiring surfaced (see below)

Architectural review request

This is the surface you flagged for explicit review: "areas that touch on architecture (specifically the host module architecture) I would like input / review on."

ADR-0009 captures three coupled decisions:

  1. Four per-capability traits + supertrait Host with blanket impl. Lets strategy code be <H: Host> generic; tests inject MockHost, production injects WitBindgenHost.
  2. SDK-side HostError mirroring the wit struct field-for-field, bridged via per-module From impls. Keeps shepherd-sdk-test world-neutral so mocks compile without a wasm toolchain.
  3. Per-module strategy.rs + lib.rs split: strategy is pure logic; lib.rs is the wit-bindgen + WitBindgenHost adapter + Guest impl.

The "Considered options" section explicitly rejects: single fat Host trait, proc-macro-now (deferred to M5), re-exporting wit-bindgen HostError. The WitBindgenHost adapter is ~150 lines of mechanical glue per module - acknowledged duplication, candidate for the M5 #[nexum::module] proc macro doc 05 describes.

docs/05-sdk-design.md carries a "Current implementation status (M3)" callout at the top distinguishing what shipped vs the M5+ vision. Your call on whether to (a) trim the doc to M3 reality, or (b) treat the rest as M4/M5 roadmap.

How M3 was developed in the fork

23 stacked PRs (#12-#34) in bleu/nullis-shepherd:

Full per-PR history with mfw78's preferred format ("What does this PR do? / Why / Changes / Breaking changes / Testing / AI assistance disclosure") in the fork.

Bugs surfaced + fixed during testnet wiring

Two M1-tail bugs the M3 testnet runbook exposed (live on Sepolia, both fixed in this epic):

  1. runtime/event_loop.rs: select_all over an empty Vec yielded None immediately, tripping the "stream ended -> shut down" arm before any event flowed. Block-only manifests (all M3 example modules) bailed the engine. Fix: substitute stream::pending() when the Vec is empty. Regression test in supervisor::tests::run_does_not_bail_when_both_stream_kinds_are_empty.
  2. Supervisor::load: init-failed modules stayed alive = true and received every block dispatch, wasting fuel on a no-op. Fix: flip alive = false when init returns Err. Boot log changes from count=N to loaded=N alive=M. Regression test in supervisor::tests::init_failure_marks_module_dead_and_excludes_from_dispatch.

Both fixes are small and clearly scoped; pull them out into separate PRs if you'd prefer.

Validation

  • Unit tests: 151 host tests + 6 doctests passing.
  • Supervisor integration tests: 5 module-specific + 2 regression tests cover the wit-bindgen + WitBindgenHost + supervisor dispatch path for every production module.
  • Live testnet (Sepolia): docs/operations/m3-testnet-runbook.md walks 3 modules end-to-end on Sepolia in ~10s wall clock. docs/operations/m3-edge-case-validation.md runs 5 error-path scenarios (bad RPC, bad oracle, capability mismatch, malformed config, cross-restart persistence) all passing.
  • cargo clippy --all-targets --workspace -- -D warnings clean.
  • cargo fmt --all --check clean.
  • cargo doc --workspace --no-deps -D warnings clean (CI gate in #28).
  • 0 em-dashes in crates/, modules/, docs/ (rust-idiomatic skill enforced).
  • WASM builds for all 5 modules under wasm32-wasip2 --release (CI matrix in #27).

Considered but deferred to M5

docs/05-sdk-design.md describes a richer SDK (#[nexum::module] proc macro, full alloy Provider via HostTransport, TypedState postcard helpers, Signer, typed Cow client with raw_request, nexum-sdk + shepherd-sdk crate split). M3 shipped the thinner Host-trait + helpers + MockHost surface; the rest is M5 work. The status callout in doc 05 makes this explicit so the doc is not misread as API reference for code that does not exist yet.

AI assistance disclosure

AI Assistance: this epic + description + all M3 development was produced by a Claude Code agent (Claude Opus 4.7 1M context). The agent designed the SDK layering, wrote ADR-0009, implemented the strategy/lib.rs split across the 5 modules, wrote the 151 unit tests + 6 doctests + 5 integration tests + 2 regression tests, ran live Sepolia validation for both happy and error paths, and authored the runbooks. A human (Bruno) reviewed and is accountable for the result.

Linear milestone: M3 - SDK + Developer Experience. Companion epic: #17 (M2).

brunota20 added 30 commits June 1, 2026 14:19
Adds the dependencies the 0.2 host backends need:

- cowprotocol (1.0.0-alpha) for the cow-api submission path
  (OrderBookApi, OrderCreation, OrderUid, Chain).
- alloy-provider / -rpc-client / -transport-ws / -primitives (1.5)
  for the chain JSON-RPC dispatch. The reqwest feature on
  alloy-provider engages connect_http; the pubsub/ws features back
  eth_subscribe-class methods.
- redb (2) for local-store. Same crate cowprotocol's own watch-tower
  picked, so the dep tree does not bifurcate when both are used in
  the same workspace.
- reqwest (0.12, rustls-tls) — direct, so the import survives any
  future cowprotocol feature rearrangement.
- tracing + tracing-subscriber (env-filter + fmt) — replaces the 0.1
  eprintln! debug log so the engine can drop into a structured log
  pipeline without re-instrumenting every host call.
- thiserror (2) — typed error enums in each backend.
- tempfile + wiremock as dev-deps for the host backend tests.

Adds engine.example.toml documenting the [engine] state_dir + per-
chain RPC URLs the chain backend reads at boot; data/ is now
ignored so a local run does not leave the redb file in tree.
Replaces the 0.2 Unsupported stubs with working backends. Each
capability lives in its own host submodule so the trait impls in
main.rs stay thin (dispatch + project the backend's typed error
onto HostError).

cow_api::submit_order
  - Parses the guest's bytes as JSON cowprotocol::OrderCreation.
  - Dispatches via cowprotocol::OrderBookApi::post_order.
  - Returns the assigned OrderUid as a 0x-prefixed hex string.

cow_api::request
  - REST passthrough. The base URL is whichever URL the pool's
    OrderBookApi client carries — so OrderBookApi::new_with_base_url
    overrides (staging, wiremock) flow through transparently.
  - Method/path validated host-side; orderbook 4xx/5xx bodies are
    surfaced verbatim so the guest can decode {errorType,description}.

chain::request
  - Raw JSON-RPC dispatch over an alloy DynProvider opened from
    engine.toml at boot. WebSocket URLs engage pubsub (eth_subscribe);
    HTTP URLs use the HTTP transport. Params are passed as
    serde_json::RawValue so alloy does not re-encode.
  - request-batch falls back to per-call dispatch (same shape as the
    earlier stub but now backed by real RPC).

local_store
  - redb file under engine_config.engine.state_dir.
  - Single shared table. Per-module namespacing is enforced
    host-side via [len:u8][module_name][raw_key] prefix on every
    key. list_keys strips the prefix before returning to the guest.

logging
  - Routes through tracing::event! tagged with module=<namespace>.
  - Engine boot installs an EnvFilter-based subscriber; RUST_LOG
    overrides the engine.toml log_level.

identity / remote-store / messaging / http stay at Unsupported per
the 0.2 roadmap (keystore / Swarm / Waku land in 0.3).

Tests (14, all green):
  - cow_orderbook: pool default chains, unknown-chain typing, REST
    GET passthrough, relative-path resolution, unknown-method
    rejection, submit_order round-trip — last three under wiremock
    so the full HTTP path is exercised without hitting api.cow.fi.
  - provider_pool: empty pool surfaces UnknownChain.
  - local_store: roundtrip, namespace isolation, delete, list_keys
    prefix-stripping, empty-namespace rejection.

End-to-end against modules/example: example.wasm loads under the
new wiring, logs init + on_event through the tracing pipeline.
…ed_crate_dependencies, drop redundant map_err)
PR #9 specific:
- main: warn + return when block/log streams end (WebSocket dropped)
- supervisor: simplify dispatch_block by extracting chain_id before move
- supervisor: temp_local_store returns (TempDir, LocalStore) instead of leaking
- README: correct engine.toml chain syntax to [chains.<id>] with rpc_url

Rebased from PR #8:
- local_store_redb: table.range() instead of iter() for O(matching) keys
- provider_pool: dedupe method clone on the success path
- main: hex_encode writes into the pre-allocated buffer
- cow_orderbook: drop blank line nit
- manifest: collapse nested if and use ? operator (clippy)
- alloy_rpc_client / alloy_transport(_ws) imports as _ to satisfy
  unused_crate_dependencies.
Move the manifest.rs monolith into a directory module with four
focused submodules (types, load, capabilities, error). Includes the
Subscription enum and the four PR #9 tests for subscription parsing.

Behaviour unchanged - pure code motion.
main.rs went from 739 lines of mixed bootstrap + 8 Host trait impls +
CLI parser + event loop to ~125 lines of pure orchestration. New
layout:

- bindings.rs: wasmtime::component::bindgen!() moved out so other
  modules can name the generated types.
- cli.rs: Cli struct + manual parser.
- host/state.rs: HostState + WasiView impl.
- host/error.rs: unimplemented / internal_error / hex_encode helpers.
- host/impls/{chain,cow_api,identity,local_store,remote_store,messaging,
  logging,clock,random,http,types}.rs: one Host trait impl per file.
- runtime/limits.rs: DEFAULT_FUEL_PER_EVENT + DEFAULT_MEMORY_LIMIT.
- runtime/event_loop.rs: open_block_streams, open_log_streams, run,
  wait_for_shutdown_signal, TaggedBlockStream, TaggedLogStream.

Adding a new capability is now a single new file under host/impls/
rather than a 60-80 line diff in main.rs.
local_store_redb.rs was 89% tests, cow_orderbook.rs was 60%, and
supervisor.rs was 32% (205 lines absolute). Promote each to a directory
module with the test suite living in a sibling tests.rs so impl-side
diffs stop competing with test churn for attention.
Carries PR #8 (host backends) + PR #9 (supervisor) + cowprotocol patch.
Open upstream: nullislabs#15.
Open upstream: nullislabs#12. Resolved .gitignore by taking the
PR #12 additions (.agents/, .claude/, skills-lock.json) plus PR #15's data/.

# Conflicts:
#	.gitignore
Add modules/twap-monitor/ as a workspace member. Cargo.toml declares
[lib] crate-type = ["cdylib"] for WASM Component output, and pulls
the deps the TWAP module path needs: cowprotocol (default-features
off — only typed primitives and OrderCreation surface needed),
alloy-sol-types (event/return decoding lands in BLEU-826/827), and
wit-bindgen.

src/lib.rs binds against the shepherd:cow/shepherd world (event-
module imports + cow-api). generate_all is required because the
world include pulls nexum:host/types across packages — without it,
wit-bindgen panics on the missing cross-package mapping. init and
on_event are stubbed: init logs once; on_event is a no-op until the
Event::Log / Event::Block dispatch lands in BLEU-826 / BLEU-827.

Verification: cargo build --target wasm32-wasip2 --release -p
twap-monitor emits a 65 KB .wasm. Engine load is gated on
module.toml (BLEU-834).
…-826)

`on_event(Event::Logs)` decodes each log against
`ComposableCoW.ConditionalOrderCreated` via `alloy_sol_types`,
extracts `(owner, params)`, and writes `watch:{owner}:{params_hash}`
to local-store with the abi-encoded `ConditionalOrderParams` as
the value. BLEU-827 reads this back via `list-keys("watch:")` and
the value is exactly the `(handler, salt, staticInput)` tuple the
poll path passes to `getTradeableOrderWithSignature`.

Idempotency: `local_store::set` overwrites in place, so re-org
replay or overlapping subscription windows produce no observable
side effect.

Resilience: `decode_conditional_order_created` returns `None`
when topic0 does not match the event signature or the payload
fails ABI decoding. Adjacent events on the same subscription
(MerkleRootSet, SwapGuardSet) are silently skipped instead of
short-circuiting the batch. The fn is on plain slices so the
host-free unit tests cover well-formed / wrong-topic / empty-
topics without wit-bindgen scaffolding.

Block, Tick, and Message variants of `Event` are left unhandled
in this PR — `Event::Block` dispatch lands in BLEU-827 (poll
path); the other two are not used by this module.

Adds `alloy-primitives` as a direct dep so the topic/data plumbing
does not rely on alloy types leaking through `cowprotocol`'s
re-exports.

`cargo build --target wasm32-wasip2 --release -p twap-monitor`
emits a 96 KB .wasm (up from the 65 KB skeleton because of the
alloy + cowprotocol composable types now linked in).
`on_event(Event::Block)` walks every persisted watch, skips the
ones gated by a future `next_block:` / `next_epoch:` entry, and
dispatches the ready ones via `chain::request("eth_call",
[{to: COMPOSABLE_COW, data}, "latest"])` to
`ComposableCoW.getTradeableOrderWithSignature(owner, params,
"", [])`.

Returns:
- Successful return data → `<(GPv2OrderData, Bytes)>::abi_decode_params`
  → `PollOutcome::Ready { order, signature }`.
- Revert payload → `decode_revert` matches the four-byte selector
  against the five `IConditionalOrder` errors:
    OrderNotValid     → DontTryAgain
    PollNever         → DontTryAgain
    PollTryNextBlock  → TryNextBlock
    PollTryAtBlock(n) → TryOnBlock(n)
    PollTryAtEpoch(t) → TryAtEpoch(t)
- Anything else falls back to TryNextBlock so a flaky RPC or
  unmodelled require-revert is retried instead of dropped.

Decoder ABI: a local `abi::Params` struct mirrors the wire format
of `cowprotocol::ConditionalOrderParams` because sol! cannot cross
crate boundaries; the resulting call selector is byte-equal to the
real contract. The successful return path decodes into the
canonical `cowprotocol::GPv2OrderData` directly, so the 12-field
struct is not duplicated. `Ready` boxes the order to keep
`PollOutcome` cache-friendly (clippy::large_enum_variant).

Storage conventions (shared with BLEU-830, which writes these):
- `next_block:{owner}:{params_hash}` -> u64 LE — block number gate
- `next_epoch:{owner}:{params_hash}` -> u64 LE — Unix-seconds gate
Either / both / neither may be set; the watch polls when both pass.
`block.timestamp` is milliseconds per WIT, so we divide by 1000 to
compare against the `TryAtEpoch` (seconds) convention.

Host follow-up: the chain backend currently swallows alloy's
`RpcError::ErrorResp.data` (it becomes `host-error.message`,
unstructured). `poll_one` is wired to consume structured revert
hex via `host-error.data` once that lands — the `decode_revert_hex`
test locks the path. Until then, every revert defaults to
TryNextBlock, which is the safe choice.

Tests: 14 new (return round-trip, all five revert variants, hex
plumbing, eth_call JSON shape, watch-key round-trip, U256
saturation), keeping the 3 BLEU-826 regressions. `.wasm` grows
from 96 KB to 215 KB (serde_json + IConditionalOrder ABI + the
GPv2OrderData decode path linked in).

Linear: BLEU-827. Ref ADR-0006.
…828)

On `PollOutcome::Ready { order, signature }`, convert the
`GPv2OrderData` to the typed `OrderData` (maps the on-chain
bytes32 markers `kind` / `sellTokenBalance` / `buyTokenBalance`
via cowprotocol's `from_contract_bytes`), wrap the signature as
`Signature::Eip1271` (ComposableCoW returns the orderbook wire
form: raw verifier bytes, the orderbook re-prepends `from`
before settlement), and feed everything through
`OrderCreation::from_signed_order_data`. The body is then
serde-encoded and pushed to `cow_api::submit_order(chain_id,
body)`.

On success, persist `submitted:{uid}` in local-store as an
empty marker — presence of the key is the receipt; BLEU-830
may later attach metadata but the bare flag is enough to
suppress double submits.

Scope notes (deliberately deferred):

- `app_data` is hard-coded to `EMPTY_APP_DATA_JSON`.
  Conditional orders that pin a real document on IPFS get
  rejected by `from_signed_order_data` (digest mismatch) and
  skipped with a Warn log instead of submitting a corrupt body.
  Resolving the document is its own concern.
- Submission errors are logged. BLEU-829 wires
  `OrderPostError::retry_hint` into this site so the backoff /
  drop decision is data-driven.
- `from` is set to the watch owner (the address that emitted
  `ConditionalOrderCreated`). The orderbook prepends this to
  the EIP-1271 blob during settlement.

Tests: 7 new (gpv2_to_order_data marker mapping incl. zero-
receiver normalisation, unknown kind / balance marker
rejection; build_order_creation happy path with serde round-
trip; rejection of non-empty app_data and `from = ZERO`).
Total 24 host tests. `.wasm` 273 KB (was 215 KB; serde for
OrderCreation, the OrderData/Signature/SigningScheme modules,
and serde_with's runtime ride along).

Linear: BLEU-828. Ref ADR-0006 (modules build orders
themselves).
brunota20 added 28 commits June 16, 2026 19:35
New library crate at crates/shepherd-sdk/ added as a workspace
member. The crate is a regular library (no cdylib), built against
both the host target and wasm32-wasip2 so helpers in BLEU-840 can
stay host-free unit-testable.

Deps follow the same default-features-off / 1.x-pinned pattern as
the modules:
- cowprotocol = "1.0.0-alpha.3", default-features = false
- alloy-primitives 1.6, alloy-sol-types 1.5
- serde 1, serde_json 1 (no_std-compat, alloc-only)

src/lib.rs lays out three placeholder modules (cow, chain, store)
that BLEU-840 will populate with the helpers currently duplicated
between twap-monitor and ethflow-watcher.

src/prelude.rs is the BLEU-835 deliverable proper: a single
`use shepherd_sdk::prelude::*` covers alloy primitives (Address,
B256, Bytes, U256, keccak256) and cowprotocol's order / signing /
orderbook-error surface (OrderCreation, OrderData, OrderUid,
OrderKind, Signature, Chain, GPv2OrderData,
EMPTY_APP_DATA_{HASH,JSON}, ApiError, OrderPostErrorKind).

The wit-bindgen-generated host types (Guest, HostError, Event,
Block, Log, …) deliberately stay *out* of the prelude — those
live in each module's own crate via the per-module
`wit_bindgen::generate!` invocation. Helpers added in BLEU-840
take primitive arguments (`&[u8]`, `Option<&str>`) so the SDK
remains world-neutral. Trade-off documented inline in lib.rs.

Builds + tests + clippy clean on host and wasm32-wasip2 (1 host
test locks the prelude surface).
Lifts the helpers currently duplicated between twap-monitor and
ethflow-watcher into shepherd-sdk so BLEU-843 can collapse the
duplication, and so future strategy modules consume them straight
from the SDK.

Layout:

  crates/shepherd-sdk/src/
  ├── cow/
  │   ├── order.rs        gpv2_to_order_data
  │   ├── composable.rs   sol! IConditionalOrder + PollOutcome
  │   │                   + decode_revert
  │   └── error.rs        RetryAction + classify_api_error
  │                       + try_decode_api_error
  └── chain/
      └── eth_call.rs     eth_call_params
                          + parse_eth_call_result
                          + decode_revert_hex

Every helper takes primitive arguments (`&[u8]`, `&str`,
`Option<&str>`, slices) so the SDK stays world-neutral — modules
unpack their wit-bindgen `HostError` / `Log` into primitives on
the way in. That keeps the SDK testable without a wasm toolchain
and re-usable across worlds (M3 examples, future strategies).

Notable shape:

- `cow::composable::PollOutcome::Ready` boxes `GPv2OrderData`
  (~300 B) so the enum stays cache-friendly when the lifecycle
  handler in BLEU-830 routes outcomes around.
- `cow::error::RetryAction::Backoff { seconds }` is parked
  (`#[allow(dead_code)]`) for the future server-supplied hint;
  cowprotocol's `retry_hint()` is bool-only today.
- `cow::error::classify_api_error(None) -> TryNextBlock` is the
  safe default — a flaky orderbook should not be treated as a
  permanent rejection.

Tests: 26 host tests covering every helper (6 gpv2 marker
mapping, 7 revert decode, 6 retry classification, 5 eth-call
plumbing, 1 SolError selector). Clippy clean on host and
wasm32-wasip2.

The modules in `modules/twap-monitor` and `modules/ethflow-
watcher` still carry their own copies; BLEU-843 deletes them.
Drops the duplicated helpers in `modules/twap-monitor` and
`modules/ethflow-watcher` in favour of `shepherd_sdk::cow` /
`shepherd_sdk::chain`:

- `gpv2_to_order_data` (was duplicated verbatim)
- `RetryAction` + `classify_api_error` + `try_decode_api_error`
- `PollOutcome` enum (was duplicated verbatim — Ready boxed,
  TryAtEpoch / TryOnBlock / TryNextBlock / DontTryAgain)
- `IConditionalOrder` sol! errors + `decode_revert`
- `eth_call_params` + `parse_eth_call_result` + `decode_revert_hex`

Kept module-side:
- `abi::Params` + `getTradeableOrderWithSignatureCall` in
  twap-monitor (TWAP-specific selector source — EthFlow does
  not poll).
- `decode_conditional_order_created` / `decode_order_placement`
  in their respective modules (each is bound to a specific
  event signature on a specific contract).
- `watch:` / `next_block:` / `next_epoch:` key conventions in
  twap-monitor and `submitted:` / `backoff:` / `dropped:` in
  ethflow-watcher (per-module persistence policies, not shared).
- `to_signature` (OnchainSignature → Signature) in ethflow-
  watcher (single consumer; will move to SDK if a second emerges).
- `BuildError` / `WatchUpdate` / lifecycle plumbing in their
  modules (strategy-specific).

LOC: -387 in twap-monitor (1058 → 671), -101 in ethflow-watcher
(537 → 436). Tests: 13 host tests stay in twap-monitor (was 34
— the 21 that moved live in shepherd-sdk now), 7 stay in
ethflow-watcher (was 10).

.wasm size delta:
- twap-monitor: 300 K → 305 K (+5 K — SDK re-exports + slight
  link-table growth; alloy + cowprotocol deduped).
- ethflow-watcher: 272 K → 275 K (+3 K).

Same wire behaviour — the SDK migration is a pure refactor; no
new dispatch paths, no new key conventions.
…841)

Two-part deliverable:

1. New `shepherd_sdk::host` module exposing the trait seam between
   strategy logic and the wit-bindgen shims a module generates per-
   cdylib:

   - `ChainHost`     — request(chain_id, method, params)
   - `LocalStoreHost`— get / set / delete / list_keys
   - `CowApiHost`    — submit_order(chain_id, body)
   - `LoggingHost`   — log(level, message)
   - `Host`          — supertrait bundling all four (blanket impl
                       so callers only need the supertrait bound)

   The traits ride on a host-neutral `HostError` (same field shape
   as wit-bindgen's), with `HostErrorKind` and `LogLevel` mirroring
   the WIT enums verbatim. Modules bridge their own wit-bindgen
   `HostError` to the SDK's with a one-liner `From` impl on each
   side; the M3 tutorial (BLEU-848) documents the adapter pattern.

2. New `shepherd-sdk-test` crate (dev-only, host-only) supplying
   in-memory implementations for every trait + assertion helpers:

   - `MockHost { chain, store, cow_api, logging }`
   - `MockChain`: programmable `(method, params)` -> result map;
     records every call with `chain_id`, `method`, `params`.
   - `MockLocalStore`: HashMap-backed; `list_keys` does a prefix
     scan (sorted output for stable assertions).
   - `MockCowApi`: single programmable response shared across
     calls; records each submission's `chain_id` + body bytes;
     `last_body_as_json` helper for inline assertions.
   - `MockLogging`: buffers all lines with their level; `contains`
     / `count_at` helpers.

   Unconfigured calls return `HostErrorKind::Unsupported` so an
   unprogrammed test fails fast instead of silently passing on a
   default value.

Tests: 8 host tests on `shepherd-sdk-test` + 1 module-level doctest
locking the recommended usage pattern. Workspace + wasm32-wasip2
check still clean.

Adoption is opt-in: existing M2 modules keep their pure-function
tests for now. BLEU-848 (tutorial) will demonstrate the new
strategy-takes-Host pattern with `MockHost` end-to-end.
- Tightened the crate-root rustdoc on `shepherd-sdk/src/lib.rs`:
  switched the inline `[Type](path)` link form to top-of-file
  reference-style link definitions so the rustdoc target is
  unambiguous and the source stays readable.
- Removed the placeholder `pub mod store {}` (out-of-scope until a
  second strategy module needs the same key conventions).
- New `crates/shepherd-sdk/README.md` covering: quick tour table,
  host-free testing recipe with `shepherd-sdk-test`, the
  no-wit-bindgen-in-SDK rationale, layout map, and how to generate
  docs with the strict flags.
- New `docs/sdk.md` repo-level landing page that lists the four
  host capabilities the SDK mirrors and links into the rustdoc per
  module.

Gate: `cargo doc -p shepherd-sdk -p shepherd-sdk-test --no-deps`
runs clean under `RUSTDOCFLAGS="-D warnings -D missing-docs"`.
Every public item carries a doc comment; intra-doc links resolve.
Tests + clippy unchanged.
New `modules/examples/price-alert/` — first canonical SDK example.
A Shepherd module that polls a Chainlink AggregatorV3 price oracle
on every block (throttled by `every_n_blocks`) and emits a Warn-
level log when the answer crosses a config-supplied threshold.

Demonstrates the three load-bearing patterns of a Shepherd module:

  - `chain::request` + ABI decode via `alloy_sol_types` (sol!
    interface AggregatorV3 declares `latestRoundData`, decode via
    `abi_decode_returns`).
  - shepherd-sdk helpers (`chain::eth_call_params` +
    `chain::parse_eth_call_result`; the SDK's prelude is *not*
    used here because the module needs none of the CoW types).
  - `[config]` driven behaviour parsed once in `init` and stored
    in `OnceLock<Settings>` for read-only access on every event.

Module-internal:

  - `Settings` (renamed from `Config` to avoid clashing with the
    wit-bindgen-generated `Config` type alias for the `init` arg).
  - `Direction { Above, Below }` deciding which side of the
    threshold fires.
  - `scale_threshold(decimal, decimals)` hand-rolled because alloy
    does not ship a `Decimal::parse_units`-style helper; handles
    optional sign, missing decimal point, short / long fractional,
    rejects non-digit garbage. Locked by 5 unit tests.
  - `classify(answer, threshold, direction)` pure 1-liner with 2
    edge tests (at-or-above vs. at-or-below behaviour at the
    boundary).
  - `parse_config(entries)` returns `Result<Settings, String>` with
    human-readable errors; 4 unit tests cover happy path, defaults,
    unknown direction, missing key.

module.toml:

  - `capabilities = ["logging", "chain"]` (no local-store; no
    cow-api).
  - `[[subscription]]` block on Sepolia (chain_id 11155111).
  - `[config]` ships defaults pointing at the canonical Sepolia
    ETH/USD feed with a 2500.00 USD threshold + "below" direction.

11 host tests; clippy clean on host + wasm32-wasip2. .wasm is
206 KB optimised — comparable to the M2 modules (twap 305 KB,
ethflow 275 KB) and dominated by alloy-sol-types + wit-bindgen
runtime.
New `modules/examples/balance-tracker/` — second canonical SDK
example. Subscribes to blocks, reads `eth_getBalance(addr)` for a
configured address list, persists each reading under
`balance:{addr}` in local-store, and emits a Warn-level log when
the delta against the prior reading exceeds `change_threshold`
wei.

Demonstrates:

- `chain::request` with a non-`eth_call` method (raw JSON-RPC
  with hand-built params), to balance the price-alert example's
  sol! / `eth_call` flow.
- `local-store` `get` / `set` per-key persistence with U256 LE
  serialisation as the wire format.
- The "diff against last seen" pattern reusable across indexer
  modules (transfer monitors, allowance trackers, …).

Module-internal:

- `Settings { addresses: Vec<Address>, change_threshold: U256 }`
  parsed from `[config]` once at `init` and stored in
  `OnceLock<Settings>`.
- `parse_balance_hex(json)` — strips JSON quotes and the `0x`
  prefix, decodes the remaining hex into a U256. Handles `"0x"`
  (zero balance), rejects unquoted / non-hex bodies.
- `parse_addresses(raw)` — comma-separated list with whitespace
  tolerance and empty-segment skipping; rejects empty lists.
- `abs_diff` + `parse_u256_le` + `u256_to_le_bytes` — pure utilities
  with edge-case coverage.

module.toml:

- `capabilities = ["logging", "chain", "local-store"]` (the
  superset that distinguishes this example from price-alert,
  which only needs chain + logging).
- `[[subscription]]` block on Sepolia (chain_id 11155111).
- `[config]` ships defaults pointing at two anvil-style EOAs and
  a 0.1 ETH change threshold.

13 host tests; clippy clean on host + wasm32-wasip2. `.wasm` is
99 KB optimised — about half of price-alert's 206 KB because it
does not pull `alloy-sol-types` into the link tree (no ABI work;
all decoding is hex/U256).
End-to-end cold-start guide that takes an external developer from
"I cloned the repo" to "I see my module's first event in the
engine log" in under four hours.

Scenario: stop-loss order — combines every load-bearing pattern in
the SDK (block subscription, chain::request + ABI decode, local-
store dedup, cow_api::submit_order, host-free tests via MockHost).
The tutorial walks through each pattern via the four worked
examples already in the repo (price-alert, balance-tracker,
twap-monitor, shepherd-sdk-test) and stitches them into the stop-
loss module.

Sections + rough budgets:

  0. Prerequisites (15m)        — toolchain check; verify the
                                  example module runs.
  1. Scaffold workspace (15m)   — Cargo.toml template + workspace
                                  members entry.
  2. Manifest (10m)             — module.toml with the four
                                  capabilities + Sepolia
                                  [[subscription]] + [config]
                                  schema.
  3. Strategy (60m)
     3a. Pure logic             — on_block<H: Host>(...) using
                                  shepherd-sdk's chain helpers
                                  and AggregatorV3 sol! interface.
     3b. Guest adapter          — wit_bindgen::generate! + the
                                  WitBindgenHost struct that
                                  bridges to shepherd_sdk::host
                                  (one-time boilerplate per
                                  module).
     3c. Unit tests             — two MockHost tests: idle-above-
                                  trigger + triggers-and-dedups.
  4. Build (5m)                 — cargo build --target
                                  wasm32-wasip2 --release +
                                  size sanity.
  5. Run (10m)                  — engine.toml WS RPC for Sepolia
                                  + cargo run -p nexum-engine.
  6. Where to go (10m)          — production hardening + real
                                  order assembly (twap-monitor
                                  cross-ref) + multi-chain.

Pure docs change — no module added (the stop-loss in §3 is the
reader's exercise; build_order_body deliberately ends in a `todo!`
with a cross-reference to twap-monitor's canonical assembly path).
Worked artefacts referenced in the tutorial are the existing
examples landed in #18 / #19 plus shepherd-sdk + shepherd-sdk-test.

Cross-links: docs/sdk.md (BLEU-844), docs/deployment.md
(BLEU-836), ADR-0001 / 0006 / 0007.

Acceptance per the issue: the tutorial is reviewer-validatable.
Time-budget callout at the end asks for a tag `docs/tutorial` if
a section drags, so we tighten on feedback.
QA pass against the team's rust-idiomatic skill ahead of M4. All
mandatory rules now hold; the cleanup is mostly mechanical with a
handful of small typing improvements where the rule asked for one
thiserror enum per error type.

## Em-dash purge (rule: "no em-dashes anywhere")

Replaced every U+2014 with " - " across .rs / .toml / .md:
  - 51 source-file occurrences
  - 5 Cargo.toml comments
  - 366 occurrences across docs/*.md (most in ADRs and the
    deployment / tutorial / sdk landings)

Grep gate: `grep -rn '—' crates/ modules/ docs/` returns 0.

## `#![cfg_attr(not(test), warn(unused_crate_dependencies))]`

Added to every crate root that previously lacked it:
  - crates/shepherd-sdk/src/lib.rs
  - crates/shepherd-sdk-test/src/lib.rs
  - modules/{example,twap-monitor,ethflow-watcher}/src/lib.rs
  - modules/examples/{price-alert,balance-tracker}/src/lib.rs

`crates/nexum-engine/src/main.rs` already had it.

## Unused-dep cleanup driven by the lint

  - shepherd-sdk dropped `serde` (only `serde_json` is actually
    imported; cowprotocol re-exports carry their own serde derive
    transitively).
  - balance-tracker dropped its direct `alloy-primitives` dep —
    now goes through `shepherd_sdk::prelude::{Address, U256,
    address}`. Tests adapt.

## thiserror conversions (rule: "one flat thiserror enum per
## module/backend; no String-wrapping of upstream errors")

  - `shepherd_sdk::host::HostError` gains `#[derive(thiserror::
    Error)]` + `#[error("{domain}: {message} (code={code},
    kind={kind:?})")]`. Was a plain struct without Display.
    Added `thiserror = "2"` as a dep.
  - `modules/twap-monitor::BuildError`: hand-rolled Display impl
    replaced with `#[derive(thiserror::Error)]` + per-variant
    `#[error(...)]` + `#[from] cowprotocol::Error`. The map_err
    at the call site collapses to `?`.
  - `modules/ethflow-watcher::BuildError`: same conversion (4
    variants, one of them `#[from]`).

Both modules add `thiserror = "2"` as a direct dep.

## Verification

  - `cargo clippy --all-targets --workspace -- -D warnings` clean.
  - `cargo test --workspace`: 121 tests pass.
    - nexum-engine 41, shepherd-sdk 27, shepherd-sdk-test 8 + 1
      doctest, twap-monitor 13, ethflow-watcher 7, price-alert
      11, balance-tracker 13.

## Architecture notes (no code changes)

  - `#[non_exhaustive]` is *not* applied to public enums
    (`HostErrorKind`, `LogLevel`, `RetryAction`, `PollOutcome`).
    The first two mirror the WIT 0.2 enums (locked at the WIT
    contract layer); the last two are intentional 3- and 5-arm
    contracts with no expected growth. If a future kind shows
    up, the rule applies then.
  - `parse_config` / `parse_settings` in the example modules
    return `Result<T, String>` rather than a typed enum. The
    rule's "no string-wrapping" applies to error variants that
    *wrap* an upstream `std::error::Error`; one-shot config
    parsers with bespoke per-field messages are pragmatic. The
    error surface is internal to the module's `init` and not
    part of the orderbook retry contract.
Validates the host-trait pattern from the M3 tutorial end-to-end on
a real module. The price-alert example now matches the recipe the
tutorial recommends:

  modules/examples/price-alert/
  ├── Cargo.toml          adds shepherd-sdk-test as dev-dep
  └── src/
      ├── lib.rs          wit_bindgen::generate! + WitBindgenHost
      │                   adapter + From conversions + Guest impl
      └── strategy.rs     pure logic against `&impl Host`
                          + parse_config + scale_threshold + tests

Strategy logic now takes `&impl shepherd_sdk::host::Host` and never
calls `nexum::host::*` free functions directly. The wit-bindgen
boilerplate (WitBindgenHost struct, ChainHost / LocalStoreHost /
CowApiHost / LoggingHost impls, convert_err / sdk_err_into_wit /
convert_level helpers) lives in lib.rs - mechanical and identical
across modules, a future declarative macro in shepherd-sdk will
elide it.

parse_config now returns `Result<Settings, shepherd_sdk::host::
HostError>` instead of `Result<T, String>`. Carrying the SDK error
through the strategy / adapter / Guest seam means the same domain /
kind / code / message / data fields surface to the operator
verbatim.

Tests: 16 (was 11) - all strategy tests now run against
shepherd_sdk_test::MockHost rather than calling wit-bindgen
directly. The 5 new ones lock the on_block behaviour end-to-end:

  - idle when price is on the safe side of the trigger
  - triggers below threshold (Direction::Below)
  - triggers above threshold (Direction::Above)
  - warns + continues on RPC timeout (no propagation into the
    supervisor)
  - warns on undecodable oracle response
  - respects `every_n_blocks` throttle

cargo clippy --all-targets --workspace -- -D warnings clean. .wasm
210 KB (was 206 KB; +4 KB for the adapter boilerplate, which
deduplicates against shepherd-sdk so future modules add no extra
cost).
Closes the loop opened by BLEU-848 (tutorial). The tutorial used to
walk through a stop-loss scenario but left `build_order_body` as a
`todo!()` cross-referencing twap-monitor. Now:

1. `modules/examples/stop-loss/` ships as a real workspace member,
   shaped the same way as the price-alert refactor (BLEU-851 / PR
   #22): pure logic in `strategy.rs` against `&impl Host`,
   wit-bindgen adapter + Guest impl in `lib.rs`.

2. The strategy is complete - reads a Chainlink oracle, builds an
   `OrderCreation` with `Signature::PreSign` (owner pre-signs
   via setPreSignature on-chain ahead of the trigger; module
   ships zero ECDSA), dedups via `submitted:{uid}`, persists
   `dropped:{uid}` on permanent submit errors.

3. Tests (7 total) cover the dispatch matrix end-to-end against
   `shepherd_sdk_test::MockHost`:

     - idle_when_price_above_trigger
     - triggers_and_submits_once_then_dedups
     - permanent_submit_error_marks_dropped (+ dedup on the next
       block)
     - transient_submit_error_leaves_state_unchanged
     - oracle_rpc_error_is_warn_and_continue
     - parse_config_round_trips_settings
     - parse_config_rejects_missing_owner

4. `docs/tutorial-first-module.md` rewritten as a guided tour
   instead of inlined snippets. The tutorial now reads the real
   `modules/examples/stop-loss/` source top-to-bottom and explains
   *why* each piece is shaped the way it is - sections on the
   wit-bindgen adapter, the `OrderCreation` assembly with
   PreSign, the dedup matrix, and the test recipe against MockHost.
   No more `todo!()`.

Numbers:

- `.wasm` 304 KB optimised (release build).
- 7 host tests passing; clippy clean on host + wasm32-wasip2.
- Tutorial is 449 lines (was 580 with the duplicated inline code);
  shorter because it points at real files instead of transcribing.

Stacks on PR #22 (price-alert host-trait refactor) so both modules
land alongside the wit-bindgen adapter recipe the tutorial
documents.
Mirrors what BLEU-851 (price-alert) and BLEU-852 (stop-loss) did for
the M3 example modules. Closes the parallel M2 gap.

Before: the entire dispatch pipeline (indexer / poll / submit /
retry / lifecycle) lived in `lib.rs` alongside the wit-bindgen
glue, calling `chain::request`, `local_store::*`, `cow_api::submit_order`,
and `logging::log` directly. The 13 existing tests covered only
parsers and encoders - the state machine itself was unverified in
unit.

After:

1. `strategy.rs` (new) - pure logic against `shepherd_sdk::host::Host`.
   Defines `LogView<'a>` and `BlockInfo` so the strategy stays
   wit-independent; exposes `on_logs` / `on_block` entry points.

2. `lib.rs` (rewritten, 665 -> 165 lines) - wit-bindgen `generate!`,
   `WitBindgenHost` adapter implementing all four SDK host traits,
   `Guest` impl that destructures `types::Event` and delegates to
   `strategy`.

3. Tests against `shepherd_sdk_test::MockHost` (7 new) cover the
   dispatch matrix that was previously hand-verified only:

     - `index_records_new_watch_on_conditional_order_created`
     - `index_overwrites_in_place_on_redelivered_log` (re-org
       replay guard, BLEU-826 invariant)
     - `poll_skips_when_next_block_gate_is_in_future`
     - `poll_ready_submits_order_and_persists_submitted_uid`
     - `submit_transient_error_leaves_state_unchanged_for_next_block`
     - `submit_permanent_error_drops_watch`
     - `poll_dont_try_again_drops_watch_and_gates` (uses a real
       `OrderNotValid` selector via the SDK-exported sol! interface)

4. All 13 original pure tests preserved unchanged. Total: 20 tests
   (was 13).

Numbers:

- `.wasm` 313,926 bytes (release wasm32-wasip2).
- 20 tests passing; clippy clean on host + wasm32-wasip2.
- 0 em-dashes in the module tree.

Stacks on PR #23 (BLEU-852) so reviewers can compare strategy /
lib.rs split side-by-side with the price-alert and stop-loss
references.
…855)

Same shape as BLEU-854 (twap-monitor / PR #24). Closes the M2-side
gap on ethflow-watcher.

Before: `submit_placement`, `prior_outcome`, `apply_submit_retry`,
and the `submitted:` / `dropped:` / `backoff:` bookkeeping called
`local_store::*` and `cow_api::submit_order` directly, with all the
state-machine bits unverified in unit (only 7 decoder / encoder
tests).

After:

1. `strategy.rs` (new) - pure logic against
   `shepherd_sdk::host::Host`. `LogView<'a>` keeps the strategy wit-
   independent; `on_logs` is the entry point.

2. `lib.rs` (rewritten, 427 -> 157 lines) - wit-bindgen `generate!`,
   `WitBindgenHost` adapter, `Guest` impl that destructures
   `types::Event::Logs` into `LogView`s and delegates to
   `strategy::on_logs`.

3. Tests against `shepherd_sdk_test::MockHost` (5 new) cover the
   dispatch + idempotency matrix:

     - `placement_log_submits_order_and_persists_submitted_uid`
     - `redelivered_placement_is_skipped_via_submitted_uid_dedup`
       (PR #10 / commit c5e4d7d regression guard)
     - `submit_transient_error_writes_backoff_marker_and_returns`
     - `submit_permanent_error_persists_dropped_uid_and_clears_backoff`
     - `eip1271_signature_shape_round_trips_through_submit_body`
       (decodes the JSON body MockCowApi received and asserts
       `signingScheme=eip1271`, signature blob verbatim, `from` =
       EthFlow contract)

4. All 7 original pure tests preserved unchanged. Total: 12 tests
   (was 7).

Numbers:

- `.wasm` 281,518 bytes (release wasm32-wasip2).
- 12 tests passing; clippy clean on host + wasm32-wasip2.
- 0 em-dashes in the module tree.

Stacks on PR #24 (BLEU-854) so reviewers can compare both M2
strategy / lib.rs splits in one stack with the M3 examples.
Pre-upstream QA pass against the M2 + M3 + M2-host-trait stacks.
Two findings applied here as a single tip-level commit instead of
rewriting each stacked PR (mfw78 prefers history preservation over
amended PRs):

1. `cargo fmt --all` across the workspace. Bulk of the churn is in
   M1 `crates/nexum-engine/src/supervisor/tests.rs` (386 line diff,
   pre-existing drift); the rest is M2/M3 leaf modules my own
   recent PRs introduced. No semantic changes.

2. One em-dash slipped past the rust-idiomatic sweep in
   `modules/examples/price-alert/src/strategy.rs:4` (a module-level
   doc comment). Replaced with ASCII ` - `.

Three em-dashes remain in `wit/**.wit` files, all in mfw78's M1
prose. Intentionally left alone - the rust-idiomatic skill is a
Bleu-internal preference and should not rewrite his upstream
authoring style. Tracked as a separate question for him in the QA
sign-off report.

QA matrix on this commit:

- `cargo fmt --all --check`: clean
- `cargo clippy --all-targets --workspace -- -D warnings`: clean
- `cargo test --workspace`: 145 host tests + 1 doctest passing
  (twap 20, ethflow 12, balance 13, price 16, stop-loss 7,
  shepherd-sdk 27, shepherd-sdk-test 8, nexum-engine 41, doctest 1)
- `cargo build --target wasm32-wasip2 --release -p <module>`:
  clean for all 5 modules. Sizes:
    twap-monitor    313,926 B
    ethflow-watcher 281,518 B
    stop-loss       311,290 B
    price-alert     215,080 B
    balance-tracker 101,518 B
- Em-dashes in `crates/` + `modules/` + `docs/`: 0
- `warn(unused_crate_dependencies)` on every crate root: present
  (sdk, sdk-test, nexum-engine, twap, ethflow, price-alert,
  balance-tracker, stop-loss)

Outstanding (deferred):

- BLEU-853 / COW-1029: `#[non_exhaustive]` batch on SDK public
  enums (HostErrorKind, LogLevel, PollOutcome, RetryAction). Held
  until just before upstream cut so wit-bindgen stays bridge-able.
- WIT-file em-dashes in upstream prose - ask mfw78.
Captures the result of the pre-upstream QA pass. Two non-blocking
follow-ups surfaced for mfw78's call before the consolidated PR:

1. `docs/05-sdk-design.md` describes a 2-layer SDK with
   `nexum-sdk` + proc macros + alloy Provider + Signer that M3
   did not ship. M3 actually delivered the thinner Host-trait +
   helpers + MockHost surface. Doc and code need to agree
   (either trim doc to M3 scope or expand M4/M5 to match doc).

2. No ADR captures the M3 Host trait + strategy/lib split
   decision. ADR-0009 candidate.

Everything else is green: 145 tests + 1 doctest, clippy clean,
0 em-dashes in our code, all 5 modules build for wasm32-wasip2,
warn(unused_crate_dependencies) on every crate root.

The 3 WIT-file em-dashes are mfw78's M1 prose - left alone.

Optional follow-ups (none gating):
- balance-tracker host-trait refactor for shape consistency.
- mfw78 PR description template adoption on existing PR bodies.
Addresses the two non-blocking architectural items surfaced in
COW-1063's sign-off matrix before the consolidated upstream PR:

(a) `docs/05-sdk-design.md` -> add a "Current implementation
    status (M3, 2026-06-17)" callout at the top with a per-feature
    table mapping every section to its actual state. The doc
    itself stays as the M5+ north-star (it's mfw78's design
    document); the callout tells readers what is shipped vs
    deferred so they don't read the proc-macro / Provider /
    Signer sections as API reference for code that exists today.

    Status table covers:
      ✅ shipped: shepherd-sdk, shepherd-sdk-test, 4-trait host
         surface + supertrait Host, HostError mirror, chain +
         cow helpers, MockHost, strategy/lib split recipe,
         block.timestamp in ms.
      ❌ deferred (M5): nexum-sdk crate split, #[nexum::module]
         / #[shepherd::module] proc macros, named event handlers,
         async fn dispatch, full alloy Provider via HostTransport,
         TypedState (postcard), Signer (identity), Cow typed
         client, MockIdentity / MockProvider / WasmTestHarness,
         cargo nexum CLI.

(b) `docs/adr/0009-host-trait-surface.md` (new) -> captures the
    three coupled M3 architectural decisions:
      1. Four per-capability traits (ChainHost, LocalStoreHost,
         CowApiHost, LoggingHost) + supertrait Host with a
         blanket impl.
      2. SDK-side HostError mirroring the wit struct
         field-for-field, bridged via per-module one-liner
         From impls. World-neutral so shepherd-sdk-test compiles
         without wasm.
      3. Per-module strategy.rs (pure, &impl Host) + lib.rs
         (wit-bindgen adapter) split, applied uniformly across
         price-alert, stop-loss, twap-monitor, ethflow-watcher.

    Considered alternatives section explicitly rejects: single
    fat Host trait, #[nexum::module] proc macro now (M5 work),
    re-exporting wit-bindgen HostError, strategy colocated with
    wit-bindgen adapter.

    Marks the COW-1029 / BLEU-853 #[non_exhaustive] batch as the
    follow-up that protects the field-equivalence assumption.

Doc 05 and ADR-0009 cross-reference each other, so readers landing
on either find the other. Both files are em-dash clean.
Previously `build-module` only compiled `-p example`, leaving the 5
production modules (twap-monitor, ethflow-watcher, price-alert,
balance-tracker, stop-loss) without CI coverage on the wasm side.
A wasm-build regression (broken cowprotocol feature flag, alloy
version drift, no_std assumption broken) would ship to upstream
review without CI catching it.

This converts the job to a `matrix.module` strategy listing all 6
modules (example kept for parity) and adds a tiny "report wasm
size" step so reviewers can spot size regressions in the Actions
log. `fail-fast: false` so one broken module does not mask others.

Verified locally:

- example          builds clean
- twap-monitor     builds clean
- ethflow-watcher  builds clean
- price-alert      builds clean
- balance-tracker  builds clean
- stop-loss        builds clean

Linear: COW-1066.
…nks (COW-1069)

Locks the rustdoc discipline BLEU-844 (COW-1045) introduced.

CI changes (.github/workflows/ci.yml):

- New `docs:` job runs `cargo doc --workspace --no-deps` with
  `RUSTDOCFLAGS="-D warnings"`. Any rustdoc warning (missing docs,
  broken intra-doc link, unresolved code reference) fails CI.

Source fixes surfaced by the new gate:

- `crates/nexum-engine/src/bindings.rs:8`: drop `[crate::host::impls]`
  intra-doc link; `impls` is `mod` (private) so rustdoc cannot
  resolve it. Keep the prose reference unquoted.
- `crates/nexum-engine/src/manifest/mod.rs:24`: `[load]` is
  ambiguous (sibling `fn load` + `mod load`). Disambiguate with
  `[mod@load]`.
- `crates/nexum-engine/src/manifest/types.rs:4`: same fix for
  `[super::load]` -> `[mod@super::load]`.

`#![warn(missing_docs)]` is already on `crates/shepherd-sdk/src/lib.rs`
(line 80) and `crates/shepherd-sdk-test/src/lib.rs` (line 59), so the
new CI step locks the existing baseline rather than introducing fresh
churn.

Verified locally:
  RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps  -> clean

Linear: COW-1069. Stacks on COW-1066 (CI matrix).
…COW-1067)

shepherd-sdk had 27 public items and 0 doctests, so renames or
signature changes on the SDK surface broke silently. Adds runnable
usage examples on the load-bearing public items.

Doctests landed:

  chain::eth_call_params         (encode JSON-RPC params)
  chain::parse_eth_call_result   (decode hex result)
  chain::decode_revert_hex       (OrderNotValid -> DontTryAgain)
  cow::classify_api_error        (InsufficientFee -> TryNextBlock;
                                  InvalidSignature -> Drop;
                                  None -> TryNextBlock default)
  cow::gpv2_to_order_data        (zero-receiver normalised to None)
  host::Host                     (strategy fn generic over &impl Host;
                                  hidden hand-rolled stub impl in the
                                  example so the doctest is self-
                                  contained and avoids the
                                  shepherd-sdk-test dev-dep cycle)

`#![warn(missing_docs)]` already on the crate root; the new gate from
COW-1069 (PR #28) enforces the rustdoc warning surface in CI.

Verified locally:

  cargo test --doc -p shepherd-sdk            -> 6 passed
  cargo test --workspace                       -> 145 host tests + 7
                                                  doctests passing
  cargo clippy --all-targets --workspace      -> clean
  cargo fmt --all --check                      -> clean
  grep -rn '—' crates/shepherd-sdk/src/        -> 0

Linear: COW-1067. Stacks on COW-1066 + COW-1069.
…ules (COW-1068)

Closes the M3 gap surfaced by the COW-1063 QA pass: every production
module had strong MockHost coverage on its strategy logic, but none
exercised the real wit-bindgen + WitBindgenHost adapter + supervisor
dispatch path. Wit-bindgen / wasmtime / linker regressions could
ship without any test catching them.

Adds 5 integration tests in `crates/nexum-engine/src/supervisor/
tests.rs`, one per production module, modelled on the existing
`e2e_supervisor_boots_example_module` shape:

  e2e_twap_monitor_block_dispatch
  e2e_ethflow_watcher_log_dispatch
  e2e_price_alert_block_dispatch
  e2e_balance_tracker_block_dispatch
  e2e_stop_loss_block_dispatch

Each test:

* Uses `module_wasm_or_skip(name)` so local runs without a fresh
  `cargo build --target wasm32-wasip2 --release -p <module>` are
  skipped rather than failing.
* Boots the supervisor with the module's real `module.toml` (not
  a synthesised manifest), so capability declarations + subscription
  shapes are honest.
* Dispatches a synthetic Block (block-subscribed modules) or Log
  (ethflow-watcher) on Sepolia chain id 11155111.
* Asserts the supervisor delivered the event and the module stayed
  alive.

Three shared helpers added next to the existing `example_wasm()`
ones:

  module_wasm(name) / module_wasm_or_skip(name)
  production_module_toml(rel_path)
  boot_production_module(...)
  synthetic_sepolia_block()

Asserts are intentionally minimal at this layer (dispatched ==
1 / alive_count == 1). Stronger module-specific assertions
(local-store keys for `submitted:{uid}`, etc.) require either
hand-crafted ABI payloads or a real chain/orderbook stub - that
work lives in COW-1064 (testnet integration). The MockHost
coverage already exercises those state transitions per BLEU-851
/ -852 / -854 / -855.

Verified locally:

  cargo test -p nexum-engine                    -> 46 passed (was 41)
  cargo test --workspace                         -> 149 host tests +
                                                    6 doctests passing
  cargo clippy --all-targets --workspace        -> clean
  cargo fmt --all --check                        -> clean
  grep -rn '—' crates/nexum-engine/src/supervisor/tests.rs -> 0

Linear: COW-1068. Stacks on COW-1066 + COW-1069 + COW-1067.
… boot)

Wires up the M2 milestone for actual testnet exercise on Sepolia.
Closes the gap "M2 is fully tested in unit + integration but has
never been run against a real chain".

## New files

- `engine.m2.toml` - workspace-root engine config that boots
  `twap-monitor` + `ethflow-watcher` against Sepolia public WS.
  Separate `state_dir = "./data/m2"` so it never collides with
  the M1 example runbook.
- `docs/operations/m2-testnet-runbook.md` - 200-line runbook with
  6 sections:
    0. Prerequisites (rustup target, just, Sepolia RPC, faucet)
    1. Smoke run (passive, observe traffic on Sepolia)
    2. Round-trip run (author a TWAP via Safe + Compose + an
       EthFlow swap via cow.fi, watch end-to-end submission)
    3. Inspecting state after a run
    4. What this run does NOT prove (and which issues cover that)
    5. Troubleshooting matrix
    6. References (engine_config schema, ADRs, PR range)
- `justfile` recipes:
    build-m2: cargo build both M2 wasm modules
    run-m2:   build-m2 + build-engine + cargo run engine

## Validated locally

Booted `cargo run -p nexum-engine -- --engine-config engine.m2.toml`
against Sepolia public WS. Observed in ~1s wall clock:

  - WS provider opened against ethereum-sepolia-rpc.publicnode.com
  - Both manifests parsed; both capability sets resolved
    (logging + local-store + chain + cow-api)
  - Both wasm components compiled
  - Both `init` succeeded
  - `supervisor up count=2`, `supervisor ready modules=2 chains=1`
  - All 3 subscriptions opened cleanly:
      block subscription chain_id=11155111
      log subscription module=twap-monitor chain_id=11155111
      log subscription module=ethflow-watcher chain_id=11155111
  - Clean SIGTERM shutdown

The actual observed log output is captured verbatim in the runbook
section 1 so future operators know what "healthy" looks like.

## Scope

- The smoke half (section 1) is passive: it validates boot +
  subscription health without producing traffic. Useful before
  every round-trip.
- The round-trip half (section 2) requires a Sepolia Safe + test
  ETH + interaction with the Compose Safe app / cow.fi UI. Cannot
  be automated from CI (chain-side actions need a wallet). Operator
  works through the steps.
- What this does NOT prove is explicit in section 4: throughput
  / soak (COW-1031), cross-module isolation under load (COW-1064),
  adversarial resource exhaustion (COW-1036), security review
  (COW-1065).

## Not addressed

- Env-var substitution in engine.toml (e.g. `${SEPOLIA_RPC}`) is
  not wired in the engine today; runbook documents the workaround
  (edit URL inline). Filing as a follow-up is out of scope here -
  if needed, add as an M4 nice-to-have.
- `ls-dump` CLI binary referenced in section 3 does not exist yet;
  section explicitly says "no ls-dump bin in 0.2; proper inspector
  is M4 scope" and falls back to re-booting the engine on the same
  state_dir to inspect rows via the dispatch logs.

Linear: stacks on COW-1068. No new issue created - this is
documentation work supporting the existing M2 milestone, not a new
deliverable.
Surfaced wiring up `engine.m3.toml` for the M3 testnet runbook: all
3 M3 example modules (price-alert, balance-tracker, stop-loss) only
declare `[[subscription]] kind = "block"`, leaving `log_streams`
empty. `select_all` over an empty Vec yields `None` immediately, the
`tokio::select!` arm fired, and the loop hit the
"log stream ended - shutting down for restart" bail before any block
flowed. The engine bailed within ~50 ms of `supervisor ready`.

Fix: replace each empty side with `futures::stream::pending()` so
the corresponding select arm is never selected. The bail-on-None
semantic still fires when a *non-empty* stream actually closes
(real WebSocket drop), which is the original intent.

The bug was symmetric (log-only configs would also bail) but only
the block-only path is exercised by an existing module config. M2
was unaffected because both modules subscribe to at least one log.

Regression test in `supervisor::tests::
run_does_not_bail_when_both_stream_kinds_are_empty`: invokes `run`
with two empty `Vec`s plus a 50 ms shutdown timer; asserts `run`
blocks the full 50 ms instead of returning at 0 ms. The pre-fix
binary returns in <5 ms.

Verified locally:
  cargo test -p nexum-engine                    -> 47 passed (was 46)
  just run-m3                                    -> 3 modules boot;
                                                    first block dispatch
                                                    fires all 3 strategy
                                                    paths against live
                                                    Sepolia (oracle read,
                                                    balance polls, cow-api
                                                    submit + retry
                                                    classification)
… 3-module E2E)

Sister doc to `docs/operations/m2-testnet-runbook.md`. Same shape,
different modules. Closes the gap "M3 is unit + integration tested
but has never been exercised against a real chain", same as the M2
runbook closed for M2.

## New files

- `engine.m3.toml` - workspace-root engine config that boots the 3
  M3 example modules (price-alert + balance-tracker + stop-loss)
  against Sepolia public WS. Separate `state_dir = "./data/m3"` so
  it never collides with M1 / M2 runbook state.
- `docs/operations/m3-testnet-runbook.md` - operator runbook
  mirroring the M2 one: prerequisites, smoke+active run (M3 is
  active by default since the example modules trigger on every
  block), optional pre-signature setup for real stop-loss
  settlement, state inspection, scope boundaries, troubleshooting,
  references.
- `justfile` recipes: `build-m3` + `run-m3`.

## Validated locally

A single Sepolia block dispatch (~10 s wall clock) drove all 3 M3
strategy paths through the live testnet:

  - **price-alert**: `chain::request eth_call` -> Chainlink
    AggregatorV3Interface -> ABI decode -> `TRIGGERED answer=
    174553978080 threshold=250000000000 (Below)` (Sepolia ETH/USD
    feed reports $1745.54, below the $2500 default threshold).
  - **balance-tracker**: 2 `chain::request eth_getBalance` calls
    (one per configured address) - SDK chain helper + multi-key
    local-store path.
  - **stop-loss**: `eth_call` oracle -> `from_signed_order_data`
    `OrderCreation` with `Signature::PreSign` -> `cow-api::submit-
    order` bytes=561 -> orderbook returns typed
    `TransferSimulationFailed` -> `classify_api_error` tags as
    retriable -> `retry on next block`. Full submit path
    confirmed; the orderbook rejection is the typed-retry
    contract working as designed (the default config's
    `owner = 0x70997970...` does not hold the sell token on
    Sepolia, so simulation correctly fails).

This validates everything the SDK BLEU-840 / BLEU-841 / BLEU-851 /
-852 / -854 / -855 PR series builds: Host trait surface, chain
helpers, cow helpers, MockHost recipe, strategy/lib split. The
same code paths that pass 145 unit tests + 6 doctests + 5
supervisor integration tests now also work against live Sepolia.

## What this validates that the M2 runbook does not

M2 only exercises the orderbook submit path indirectly (through
the EthFlow watcher reacting to swap.cow.fi traffic, and only when
app_data is empty - documented limitation). M3 stop-loss submits
proactively on every poll, so the orderbook always sees a real
`OrderCreation` body even if it rejects. The typed-retry SDK
contract (`classify_api_error` mapping `TransferSimulationFailed`
-> `RetryAction::TryNextBlock`) is exercised end-to-end with a
real orderbook response, not a fixture.

## Stacks on

- `fix(event_loop)` commit immediately preceding this one - the
  bug surfaced wiring up `engine.m3.toml` (block-only subscriptions
  bailed the engine pre-fix).
- PR #31 (M2 runbook) - same operator-doc shape, same conventions.
…pass

Closes the gap "M3 happy path is validated on testnet but error
paths are not". Five mutations of `engine.m3.toml` /
`module.toml::[config]` run against the live `just run-m3` boot;
each captured observed output + verdict.

## Scenarios run

| # | Mutation | Observed | Verdict |
|---|---|---|---|
| 1.1 | engine.m3.toml: rpc_url = "wss://nonexistent.example.com" | `Error: connect chain 11155111: IO error: failed to lookup address information` + clean exit | ✅ structured + fail-fast |
| 1.2 | price-alert: oracle_address = 0x...01 (EOA, no code) | `WARN price-alert: latestRoundData decode failed: ABI decoding failed: buffer overrun while deserializing` + module alive | ✅ graceful + clear error |
| 1.3 | stop-loss: required = ["logging"] (dropped chain/local-store/cow-api) | `Error: load module ... capability violation in stop_loss.wasm component imports cow-api but it is not listed in [capabilities].required` | ✅ security boundary enforced |
| 1.4 | price-alert: threshold = "not-a-number" | `WARN init failed module=price-alert kind=HostErrorKind::InvalidInput "threshold: non-digit character in 'not-a-number'"` + other modules unaffected | ✅ with 1 minor observation (see below) |
| 1.5 | boot 1 (rm -rf data/m3) -> boot 2 (no rm) | both boots clean, redb file preserved | ✅ cross-restart persistence |

## Surfaced finding

Scenario 1.4 caught a minor supervisor behaviour: init-failed modules
stay `alive=true` and continue to receive dispatches. Safe in
practice because all M3 example modules guard with
`SETTINGS.get().is_none() -> return Ok(())`, but wastes fuel + RPC
requests per block on a no-op. Filed as a follow-up issue
recommending `Supervisor::load` set `alive=false` (or skip the push
into `self.modules`) when `Guest::init` returns `Err(HostError)`.

## Validates

- Engine error reporting: 5 distinct error paths each surface a
  typed error with clear domain + message. No silent failures, no
  panics, no infinite retry loops.
- M3 SDK contract: BLEU-814 (32-byte namespace), COW-1025 (capability
  enforcement), BLEU-851 / -852 / -854 / -855 (typed Settings parsing
  via HostError) all verified on live Sepolia, not just MockHost.
- Operator UX: every misconfiguration scenario produces output an
  operator can act on without reading source.

## Reproduce

Each mutation is one line. `git checkout` to restore between runs.
The full diff per scenario is inline in the doc.

## Not in scope (M4 territory)

- Fuel exhaustion (COW-1036)
- Module trap during on_event + supervisor restart (COW-1033 /
  COW-1032)
- WS reconnect with backoff (current is bail + external restart;
  flagged in event_loop.rs as "0.3 fix")
- State-dump CLI for redb inspection (M4 nice-to-have)

## Follow-up issue

Filed separately: "Supervisor::load should mark module alive=false
when init returns Err(HostError)". Linear MCP was unavailable at
commit time; issue to be filed manually in COW project under M3
milestone.
…070)

Pre-fix behaviour: `Supervisor::load` pushed every module into
`self.modules` with `alive = true`, even when `Guest::init` returned
`Err(HostError)`. The supervisor logged `WARN init failed` but the
dispatcher still routed every block / log to the dead module, where
the M3 example strategies short-circuited via
`SETTINGS.get().is_none() -> return Ok(())`. Safe but wasteful, and
the `supervisor up count=N` log was misleading (counted the dead
module as up).

Surfaced live on Sepolia by scenario 1.4 of
`docs/operations/m3-edge-case-validation.md`: set
`[config] threshold = "not-a-number"` in price-alert, observe init
return InvalidInput, then watch the dispatcher hammer the dead
module every block for 14s.

## Fix

`Supervisor::load` now captures the init result into
`init_succeeded: bool` and sets `LoadedModule.alive = init_succeeded`.
The boot log changes from `supervisor up count=N` to
`supervisor up loaded=N alive=M` so the discrepancy is loud.

## Regression test

`supervisor::tests::init_failure_marks_module_dead_and_excludes_from_dispatch`:

- Synthesises a manifest matching real price-alert shape but with
  `threshold = "not-a-number"`.
- Boots the supervisor; asserts `module_count() == 1` (loaded) and
  `alive_count() == 0` (dead).
- Dispatches a synthetic Sepolia block; asserts `dispatched == 0`
  (the only "subscribed" module is dead, so the dispatch fast-path
  skips it).

## Live validation on Sepolia (rerun of scenario 1.4 with fix)

Before fix:
```
INFO supervisor up count=3    <-- includes dead module
```

After fix:
```
WARN init failed - module loaded but marked dead; dispatcher will skip it
     module=price-alert kind=HostErrorKind::InvalidInput
INFO supervisor up loaded=3 alive=2
```

## Docs update

`docs/operations/m3-edge-case-validation.md` scenario 1.4 verdict
updated from "✅ with minor observation" to
"✅; resolved in this PR series". The original observation block
is replaced with a note pointing at the regression test + the new
log line.

## Workspace state

- `cargo test --workspace` -> 151 host tests + 6 doctests passing
  (was 150 + 6; +1 from the new regression test).
- `cargo clippy --all-targets --workspace -- -D warnings` clean.
- `cargo fmt --all --check` clean.
- 0 em-dashes in changed files.

Linear: COW-1070. Closes the only finding from PR #33.

## Considered alternatives

**Option B** (skip pushing the init-failed module into
`self.modules` entirely) would have been cleaner but requires
callers of `Supervisor::load_one` to handle the "module not added"
case. Option A (this PR - flip alive=false) preserves the existing
API surface; the dispatch fast-path already gates on `if !alive
{ continue; }` so the dispatched-event count drops to 0 without any
caller-side change.

**Option C** (visibility only - rename the boot log) was rejected;
it surfaces the discrepancy but does nothing about the per-block
no-op fuel cost on the dead module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant