Skip to content

feat(ethflow-watcher): downgrade ExcessiveValidTo drops to Info (COW-1076)#49

Closed
brunota20 wants to merge 1 commit into
feat/forward-orderbook-error-cow-1075from
feat/ethflow-expected-excessive-validto-cow-1076
Closed

feat(ethflow-watcher): downgrade ExcessiveValidTo drops to Info (COW-1076)#49
brunota20 wants to merge 1 commit into
feat/forward-orderbook-error-cow-1075from
feat/ethflow-expected-excessive-validto-cow-1076

Conversation

@brunota20

Copy link
Copy Markdown
Collaborator

Summary

EthFlow on-chain orders carry validTo = u32::MAX by design (cowprotocol::eth_flow documents this as the canonical CoW-side shape on every chain - cancellation is operator-controlled via the EthFlow contract, not orderbook-time-bounded). The Sepolia orderbook's max-validTo cap rejects this shape with errorType = "ExcessiveValidTo". After PR #48 the strategy already classifies it as RetryAction::Drop (no retry storm); the remaining gap was operator ergonomics: every EthFlow placement on Sepolia produced a Warn-level "ethflow dropped" line, which would dominate a 7-day soak dashboard with non-anomalous traffic.

Change

In apply_submit_retry's Drop arm, peek at the decoded ApiError. If the orderbook's errorType == "ExcessiveValidTo", log at Info instead of Warn. Every other Drop reason (InvalidSignature, WrongOwner, InvalidAppData, ...) keeps Warn so real anomalies still page the operator. Dispatch (dropped:{uid} marker write, stale backoff: clear) is unchanged.

Scope kept narrow on purpose:

  • Strategy already filters logs to EthFlow contract addresses; ExcessiveValidTo here is unambiguously the documented constraint.
  • We do NOT inspect the order's validTo field directly - that would be defence-in-depth duplicating the orderbook's own check, and it would make the gate brittle if the orderbook ever loosens its cap.

Tests

File Test Asserts
modules/ethflow-watcher/src/strategy.rs submit_excessive_valid_to_logs_at_info_not_warn End-to-end through on_logs: exactly one drop line at Info level, zero Warn drops, dropped:{uid} written, no backoff:{uid} left over.
same submit_other_permanent_error_still_logs_at_warn Regression guard: InvalidSignature keeps Warn.
same submit_drop_without_envelope_keeps_warn_level is_expected_excessive_valid_to returns false when HostError.data is None (e.g. transport failure).

All 17 ethflow-watcher tests pass (+3 new); workspace tests untouched. cargo clippy --all-targets -- -D warnings clean. cargo fmt --check clean for the touched file (one pre-existing em-dash on line 181 is from a previous PR; not in scope here).

Docs

docs/operations/e2e-testnet-runbook.md gains a "5.5 Known upstream constraints on Sepolia" section documenting:

  • The protocol design (validTo = u32::MAX).
  • The orderbook's rejection (ExcessiveValidTo).
  • The post-fix operator-visible behaviour (Info log, single dropped: marker per UID).
  • The Prometheus signal (shepherd_cow_api_submit_total{outcome="err"} grows by the EthFlow placement count then plateaus).
  • The pending upstream confirmation in COW-1076.

Why this matters for COW-1031 (7-day soak)

Before this PR: every EthFlow placement on Sepolia produced a Warn-level ethflow dropped line - on a soak window with active EthFlow traffic, the operator dashboard would surface dozens of "drops" per day that all map to the same documented gap.

After this PR: those become Info-level lines, the Warn channel stays reserved for real anomalies (InvalidSignature, AppData mismatches, transport failures), and the soak's "0 unexpected errors" acceptance bar (COW-1064 §7) reads cleanly.

Stack

Stacked on feat/forward-orderbook-error-cow-1075 (PR #48) → stacked on feat/resolve-app-data-cow-1074 (PR #47).

Upstream follow-up (out of scope for this PR)

Open question: does mainnet's orderbook also reject validTo = u32::MAX for EthFlow orders? If yes, this is a protocol-wide constraint and the EthFlow contract documentation needs revisiting at the cowprotocol level. If no, Sepolia services config drift, file a services issue. Tracked in COW-1076.

Linear

Moves COW-1076 toward closure on the strategy + docs side. Upstream confirmation stays as the open thread on that issue.

AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).

…1076)

EthFlow on-chain orders use `validTo = u32::MAX` by design (see `cowprotocol::eth_flow`). The Sepolia orderbook's max-validTo cap rejects this shape with `errorType = "ExcessiveValidTo"`, and after the COW-1075 host fix the strategy already classifies it correctly as Drop. The remaining gap was operator ergonomics: every EthFlow placement on Sepolia produced a Warn-level "ethflow dropped" line, which would dominate a 7-day soak dashboard with non-anomalous traffic.

Change: in `apply_submit_retry`'s Drop arm, peek at the decoded ApiError. If the orderbook's `errorType == "ExcessiveValidTo"`, log at Info instead of Warn. All other Drop reasons (InvalidSignature, WrongOwner, etc.) keep Warn so real anomalies still page the operator. Dispatch (write `dropped:{uid}`, clear stale `backoff:{uid}`) is unchanged.

Why not gate on more (e.g. inspect the order's validTo field): the strategy already filters logs to EthFlow contract addresses; ExcessiveValidTo from the orderbook for an EthFlow placement is unambiguously the documented constraint. Keeping the gate narrow avoids accidentally suppressing other-cause Warns.

Tests (3 new in `modules/ethflow-watcher/src/strategy.rs`):
- `submit_excessive_valid_to_logs_at_info_not_warn`: end-to-end through `on_logs`; confirms exactly one drop line at Info level and zero Warn drops for this case.
- `submit_other_permanent_error_still_logs_at_warn`: regression guard - InvalidSignature stays at Warn.
- `submit_drop_without_envelope_keeps_warn_level`: predicate-level unit test confirming `is_expected_excessive_valid_to` returns false when `HostError.data` is None (e.g. transport failure).

Docs: added "Known upstream constraints on Sepolia" section to `docs/operations/e2e-testnet-runbook.md` documenting this gap, the post-fix operator-visible behaviour, the Prometheus signal (`shepherd_cow_api_submit_total{outcome=\"err\"}` grows by the EthFlow placement count then stops), and a pointer to COW-1076 for the upstream-confirmation status.

Soak impact: the COW-1031 7-day run on Sepolia will now show ExcessiveValidTo drops as Info-level traffic. The soak's "0 unexpected errors" acceptance bar is preserved because Warn-level drops only fire on real anomalies.

All 17 ethflow-watcher tests pass (+3 new); workspace tests untouched. clippy + fmt clean. AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context).
@linear-code

linear-code Bot commented Jun 19, 2026

Copy link
Copy Markdown

COW-1076

brunota20 added a commit that referenced this pull request Jun 23, 2026
…redesign, COW-1076)

The original BLEU-833 design lifted the on-chain GPv2OrderData into an
OrderCreation body and POSTed it to /api/v1/orders with the EthFlow
contract as the EIP-1271 owner. Empirical evidence collected during a
2026-06-22 Sepolia soak shows that path cannot succeed:

  - The 3 OrderPlacement events the watcher observed during the run
    (creationDate 18:43, 18:53, 21:19 UTC, UIDs ending in
    ba3cb449...EAdec|ffffffff) are *fulfilled* on the orderbook with
    validTo = u32::MAX — the same shape the orderbook rejects when
    shepherd POSTs it.
  - The orderbook entries carry onchainUser, onchainOrderData and
    ethflowData.userValidTo (a real unix timestamp), and these fields
    do not exist on the OrderCreation request body in
    cowprotocol-orderbook. They are populated by the orderbook backend
    itself, which indexes the OrderPlacement event server-side and
    creates the order with dual-validTo bookkeeping.

So /api/v1/orders is structurally the wrong endpoint for on-chain
EthFlow placements: the orderbook's native indexer is the one that
creates the order entry, and the public POST endpoint applies the
generic validTo cap and rejects with ExcessiveValidTo every time.
Continuing to submit produces only garbage outcomes per real EthFlow
trade (misleading dropped/backoff markers and a misleading
shepherd_cow_api_submit_total{outcome="err"} increment).

This commit replaces the submit path with the contract the orderbook
actually offers: shepherd decodes the OrderPlacement event, computes
the canonical UID, GETs /api/v1/orders/{uid}, and records the outcome:

  - 200 → write observed:{uid}, log Info ("orderbook indexed").
  - 404 → log Info ("not yet indexed; will recheck on re-delivery")
    and do not write the marker, so a future log re-delivery or a
    block-tick poll (out of scope here) can re-verify.
  - other err → log Warn with the host's code and message.

Idempotency stays event-driven: once observed:{uid} is written, log
re-delivery is a no-op. Block-tick-driven retries for stuck indexer
lag are intentionally out of scope and tracked separately.

Test surface: the existing BLEU-832 decode tests are preserved verbatim
and the GPv2OrderData fixture is reused with the canonical EthFlow
markers (OrderKind::SELL, SellTokenSource::ERC20, BuyTokenDestination::ERC20).
Two new tests cover UID computation invariants: the owner suffix pins
to the EthFlow contract and the validTo suffix pins to the on-chain
order's validTo. A third asserts compute_uid returns None on unsupported
chains so callers fall back to a Warn log rather than panic.

This invalidates the design that PRs #10 (BLEU-833), #47 (COW-1074),
#48 (COW-1075) and #49 (COW-1076) were all patching. After review,
those PRs should be closed or substantially rewritten — none of the
appData resolution / error-envelope forwarding / log-downgrade work is
needed once the broken endpoint is no longer called.

AI assistance disclosure: drafted by Claude (Opus 4.7, 1M context)
following the soak inspection that surfaced the divergence.
@brunota20

Copy link
Copy Markdown
Collaborator Author

Closing as superseded by the observe-not-submit redesign in PR #62 (M2 layer) + PR #63 (BLEU-855 / M3-M4 layer).

Background: a 2026-06-22 Sepolia soak observation showed that the 3 EthFlow placements logged here as ExcessiveValidTo drops were fulfilled on the orderbook at the same UIDs. The orderbook backend indexes EthFlow OrderPlacement events natively with its own dual-validTo bookkeeping (onchainUser, onchainOrderData, ethflowData.userValidTo) — fields the public OrderCreation body cannot carry. So POST /api/v1/orders is structurally the wrong endpoint for on-chain EthFlow placements and there is no Drop log to downgrade once the watcher stops submitting.

The COW-1076 reframing comment captures the full evidence: https://linear.app/bleu-builders/issue/COW-1076

No code from this PR is retained — the entire is_expected_excessive_valid_to classifier and the Drop-arm level branching disappear when the submit path is removed.

@brunota20 brunota20 closed this Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant