fix(cow-api): forward orderbook ApiError to HostError.data (COW-1075)#48
Open
brunota20 wants to merge 1 commit into
Open
fix(cow-api): forward orderbook ApiError to HostError.data (COW-1075)#48brunota20 wants to merge 1 commit into
brunota20 wants to merge 1 commit into
Conversation
…COW-1075)
`OrderBookPool::submit_order_json` returns `CowApiError::Orderbook(cowprotocol::Error::OrderbookApi { status, api })` for any 4xx with a typed `{"errorType": "...", ...}` body (see `cowprotocol::transport::HttpResponse::into_status_error`). The WIT adapter was dropping `api` on the floor (`data: None`), so the guest's `shepherd_sdk::cow::classify_api_error` always saw `None` and fell back to its safe-default `TryNextBlock`. Permanent rejections like `DuplicatedOrder`, `InvalidSignature`, or `ExcessiveValidTo` therefore looped forever, masquerading as transient failures.
Root cause of the stop-loss infinite-retry behaviour observed in the 2026-06-18 COW-1064 dry run (e2e-report-2026-06-18.md §6.3): 76 retries of an already-submitted order in 170 blocks because the host never let the guest see what the orderbook actually said.
Fix is in the WIT adapter (`crates/nexum-engine/src/host/impls/cow_api.rs`), not the SDK classifier. The classifier already handles `Unknown(_)` -> `Drop` correctly via its `Some(_) => Drop` branch; it just needed the envelope to dispatch on. Extracted the projection into a testable `orderbook_to_host_error` helper that:
- serialises `ApiError` into `HostError.data` as JSON when the variant is `OrderbookApi { status, api }` (the only variant carrying a structured payload),
- sets `code` to the HTTP status so guests can disambiguate 4xx vs 5xx,
- leaves `data: None` for other `cowprotocol::Error` variants (transport, serde, unexpected-status) since they have no envelope and `TryNextBlock` is the correct safe default for them.
Tests:
- `orderbook_to_host_error` unit tests cover the envelope-forward, the optional inner `data` round-trip, and the non-envelope `UnexpectedStatus` branch (3 cases).
- New wiremock integration test `submit_order_propagates_orderbook_envelope` confirms a 400 with `errorType: "DuplicatedOrder"` surfaces the `OrderbookApi` variant end-to-end through `OrderBookPool::submit_order_json`.
All 13 cow-api-adjacent tests pass; workspace tests untouched.
This was referenced Jun 19, 2026
brunota20
added a commit
that referenced
this pull request
Jun 23, 2026
…redesign, COW-1076)
The original BLEU-833 design lifted the on-chain GPv2OrderData into an
OrderCreation body and POSTed it to /api/v1/orders with the EthFlow
contract as the EIP-1271 owner. Empirical evidence collected during a
2026-06-22 Sepolia soak shows that path cannot succeed:
- The 3 OrderPlacement events the watcher observed during the run
(creationDate 18:43, 18:53, 21:19 UTC, UIDs ending in
ba3cb449...EAdec|ffffffff) are *fulfilled* on the orderbook with
validTo = u32::MAX — the same shape the orderbook rejects when
shepherd POSTs it.
- The orderbook entries carry onchainUser, onchainOrderData and
ethflowData.userValidTo (a real unix timestamp), and these fields
do not exist on the OrderCreation request body in
cowprotocol-orderbook. They are populated by the orderbook backend
itself, which indexes the OrderPlacement event server-side and
creates the order with dual-validTo bookkeeping.
So /api/v1/orders is structurally the wrong endpoint for on-chain
EthFlow placements: the orderbook's native indexer is the one that
creates the order entry, and the public POST endpoint applies the
generic validTo cap and rejects with ExcessiveValidTo every time.
Continuing to submit produces only garbage outcomes per real EthFlow
trade (misleading dropped/backoff markers and a misleading
shepherd_cow_api_submit_total{outcome="err"} increment).
This commit replaces the submit path with the contract the orderbook
actually offers: shepherd decodes the OrderPlacement event, computes
the canonical UID, GETs /api/v1/orders/{uid}, and records the outcome:
- 200 → write observed:{uid}, log Info ("orderbook indexed").
- 404 → log Info ("not yet indexed; will recheck on re-delivery")
and do not write the marker, so a future log re-delivery or a
block-tick poll (out of scope here) can re-verify.
- other err → log Warn with the host's code and message.
Idempotency stays event-driven: once observed:{uid} is written, log
re-delivery is a no-op. Block-tick-driven retries for stuck indexer
lag are intentionally out of scope and tracked separately.
Test surface: the existing BLEU-832 decode tests are preserved verbatim
and the GPv2OrderData fixture is reused with the canonical EthFlow
markers (OrderKind::SELL, SellTokenSource::ERC20, BuyTokenDestination::ERC20).
Two new tests cover UID computation invariants: the owner suffix pins
to the EthFlow contract and the validTo suffix pins to the on-chain
order's validTo. A third asserts compute_uid returns None on unsupported
chains so callers fall back to a Warn log rather than panic.
This invalidates the design that PRs #10 (BLEU-833), #47 (COW-1074),
#48 (COW-1075) and #49 (COW-1076) were all patching. After review,
those PRs should be closed or substantially rewritten — none of the
appData resolution / error-envelope forwarding / log-downgrade work is
needed once the broken endpoint is no longer called.
AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context)
following the soak inspection that surfaced the divergence.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Engine-side fix for COW-1075. The WIT
cow-api::submit-orderadapter was constructingHostErrorwithdata: Noneon theCowApiError::Orderbook(cowprotocol::Error::OrderbookApi { status, api })branch, dropping the orderbook's structuredApiErrorenvelope on the floor. The guest'sshepherd_sdk::cow::classify_api_errortherefore always sawNone, fell back to its safe-defaultTryNextBlock, and looped forever on permanent rejections likeDuplicatedOrder,InvalidSignature, orExcessiveValidTo.Root cause vs. the COW-1075 issue title
The issue title ("classify_api_error should map DuplicatedOrder to Drop") points at the SDK classifier, but the classifier is already correct - it handles
OrderPostErrorKind::Unknown(_)via theSome(_) => Droparm. The data never reached it. The fix lives in the host adapter, not the SDK.Evidence (COW-1064 dry-run report §6.3): the stop-loss strategy logs
retry on next block (0): orderbook error (DuplicatedOrder): ...- the(0)isHostError.code, and(DuplicatedOrder)comes fromcowprotocol::Error::OrderbookApi's Display impl on themessagefield. So the parsedApiErrorreachedmessagebut notdata.Change
orderbook_to_host_error(err: cowprotocol::Error) -> HostErrorincrates/nexum-engine/src/host/impls/cow_api.rs.cowprotocol::Error::OrderbookApi { status, api }: serialiseapias JSON intoHostError.data, setcode = status as i32so guests can disambiguate 4xx vs 5xx.cowprotocol::Errorvariants (transport, serde, unexpected-status): leavedata: Noneand let the guest classifier apply itsTryNextBlocksafe default - that's the intended contract for non-envelope failures.Tests
host::impls::cow_api::tests)orderbook_api_error_is_forwarded_in_dataOrderbookApi { status: 400, api: { error_type: "DuplicatedOrder", ... } }producesHostError { code: 400, data: Some(json), kind: Denied }anddataround-trips throughserde_json::from_str::<ApiError>with the original fields.orderbook_api_error_preserves_optional_data_fieldApiError.datapayload survives the projection (min_feein this case).non_envelope_cowprotocol_error_leaves_data_noneUnexpectedStatus(and by extension the other non-envelope variants) leavesHostError.data = None,code = 0.host::cow_orderbook::tests)submit_order_propagates_orderbook_envelope{"errorType":"DuplicatedOrder","description":"order already exists"};OrderBookPool::submit_order_jsonsurfacesCowApiError::Orderbook(cowprotocol::Error::OrderbookApi { status: 400, api: ... }). Pins the contract this PR depends on.Workspace test count: 187 host tests passing, no regressions.
cargo clippy --all-targets -- -D warningsandcargo fmt --checkboth clean.Why this matters for COW-1031 (7-day soak)
Before this PR: any non-retriable orderbook rejection causes infinite block-by-block re-submission until the watch is manually cleared. The soak would have produced a flat
shepherd_cow_api_submit_total{outcome="err"}curve climbing forever on the stop-loss / ethflow modules, which is the opposite of "zero manual intervention".After this PR:
Dropis reached as designed, thedropped:{uid}marker is written, and the curve plateaus.Stack
Stacked on
feat/resolve-app-data-cow-1074(PR #47).Linear
Closes COW-1075. The issue description will be updated post-merge to reflect the root cause shift (host-side, not SDK-side).
This PR was drafted with AI assistance (Claude Code, Opus 4.7).