Skip to content

Shadow-mode order simulation at creation#4366

Merged
squadgazzz merged 183 commits into
mainfrom
prototype-eip1271-on-new-api
May 19, 2026
Merged

Shadow-mode order simulation at creation#4366
squadgazzz merged 183 commits into
mainfrom
prototype-eip1271-on-new-api

Conversation

@squadgazzz
Copy link
Copy Markdown
Contributor

@squadgazzz squadgazzz commented Apr 29, 2026

Problem

Orders can pass isValidSignature (EIP-1271) or be accepted as PreSign and still fail to settle because of broken post-hooks or wrappers. We catch these in the auction, wasting solver cycles. Same problem as #4355.

Fix

Successor to #4355, rebuilt on top of @MartinquaXD's SettlementSimulator (#4372).

Run a full-order simulation at creation time for every order type (Eip712, EthSign, EIP-1271, PreSign), alongside the existing signature path. For EIP-1271, the cheap isValidSignature check still runs and still decides acceptance. For everything else only the simulation runs. In all cases the simulation is shadow mode: logged, never rejects.

Infra errors (RPC, timeout, Tenderly) are logged and ignored. Once we trust the simulator against real traffic, an enforce-mode follow-up will turn reverts into rejections.

Changes

  • shared::order_creation_simulation: new module with the OrderSimulating trait and OrderCreationSimulator recipe. Lives next to its consumer (order_validation), not inside the simulator crate.
  • OrderValidator invokes the simulator for every order type via calculate_verification_gas_limit.
  • SettlementSimulator is built once in PriceEstimatorFactory and shared with the orderbook. Old duplicate construction in orderbook::run is gone.
  • OrderSimulationConfig collapses to a single order_simulation_timeout: Duration on Configuration. Gas limit and Tenderly live in price-estimation.
  • Recipe calls .presign_orders() so PreSign orders simulate with a synthetic on-chain pre-signature.
  • Wrapper path routes through the deployed FlashLoanRouter so flashloan-style orders simulate end-to-end.

How to test

  • Unit tests in shared::order_validation cover the signature × simulation outcome matrix in shadow mode, fail-open on infra errors, the eip1271_skip_creation_validation path, the no-simulator-configured path, and the non-EIP-1271 simulator path.
  • Shadow simulation is now enabled in Configuration::test_default(), so every e2e test that calls Services::start_protocol exercises the simulator path automatically.
  • Forked-mainnet replays in crates/simulator/tests/aave_replay.rs (gated on MAINNET_RPC_URL): a real Aave v3 debt-swap order succeeds, the same order with flashloan.amount rewritten past Aave's WETH liquidity reverts, and a real Aave v3 collateral-swap order that the mainnet driver dropped 295 times on 2026-05-05 reproduces the EVM revert.

Runs the OrderSimulator concurrently with the cheap isValidSignature
check. In shadow mode (default), logs disagreements via metrics and
structured logs but returns the cheap check's result. In enforce mode,
(cheap Pass, sim Fail) is upgraded to ValidationError::SimulationFailed;
other combinations stay unchanged. Infra errors never reject.

Covers scope from plan Tasks 3, 4 and 5: shadow-mode quadrants,
enforce-mode cases, infra/skip-flag/no-sim paths.
The Shadow/Enforce distinction is a mode variant, not a property of the
capability — the same simulator infrastructure is active in both modes.
Keeping "shadow" only where it names a mode variant, and renaming the
trait, error, mode enum, metrics, constant, config fields, and module
accordingly.
Doc comments and local variable names still described the capability
as 'shadow' — rewritten to reference the mode variant only where it
genuinely applies (config docs, test names that exercise Shadow mode,
the Shadow enum variant).
The simulator, mode, and timeout are only meaningful together. Collapsing
them into a single Option<Eip1271SimConfig> field lets the call site in
run.rs return None cleanly when order_simulation isn't configured,
instead of passing placeholder mode/timeout values that aren't read.
… to Simulator

Matches the project's convention of -ing suffix for traits
(OrderValidating/OrderValidator, SignatureValidating, BalanceFetching).
The former Eip1271SimConfig bundle becomes the concrete Eip1271Simulator
struct, and the trait it depends on becomes Eip1271Simulating.
The helper now accepts the full simulator bundle instead of three
separate (simulator, mode, timeout) args. Introduces shadow_mode_sim /
enforce_mode_sim helpers to keep test call sites tight.
The constant is only used by tests; keeping it at module scope (and
public) implied an API contract with the configs crate that doesn't
exist. configs::orderbook::default_eip1271_sim_timeout is authoritative
at runtime.
The existing EIP-1271 check is an isValidSignature call; 'cheap' was
editorializing on cost rather than describing what it does. Renaming
to 'signature' for the metric axis, outcome enum, and supporting names.

Also collapsing sim_only_total into total by adding a 'skipped' value
to the signature axis — one counter with a signature × sim matrix
covers every case the two used to cover.
…configs, logs

The Eip1271Simulator struct keeps its -Simulator suffix (matching
OrderValidator/-Validating), but everywhere sim was used as a modifier
or qualifier (enum variants, error type, metric subsystem/labels,
config fields, log messages, test names) it now reads as simulation.
Operators can now opt out of the simulation at order creation on a
per-chain basis without giving up the /debug/simulation endpoint. The
shared mode enum stays binary (Shadow/Enforce); Disabled translates to
None for OrderValidator at the wiring layer. The OrderSimulator is
still constructed for the debug endpoint.

Also removed the redundant impls_trait compile-check test in
eip1271_simulation.rs — the impl block above already enforces that at
compile time.
Mock both the signature validator and the simulator with times(0) and
submit an Eip712 (EOA) order. Catches a regression where the sim is
accidentally wired to run for non-1271 orders.
The seven near-identical tests covering every (signature, simulation,
mode) combination collapsed into one table-driven test with a single
mock-driven helper. Failure messages include a label per row so any
regression still pinpoints the failing cell.

Also inlined the now-unused enforce_mode_simulator helper and replaced
shadow_mode_simulator with a general simulator_with_mode.
- Move simulation_time histogram timer inside simulation_fut so it no
  longer captures the max of sig-check + simulation latency.
- Warn-level (not info) for simulation failures in the
  eip1271_skip_creation_validation path, matching the normal path.
- Drop Default derive on the shared Eip1271SimulationMode since the
  operational default lives in configs (Disabled) and no code reaches
  for it. Updated the doc to clarify the split.
- New configs test asserting "shadow" deserializes to the Shadow
  variant.
squadgazzz added 3 commits May 7, 2026 18:40
Drops the disabled / shadow / enforce mode enum and the per-outcome
metric matrix. Order simulation now runs alongside the signature check
purely for observability. Disagreements are logged. The signature check
alone decides whether the order is accepted.

Enforce mode (rejecting orders whose simulation reverts) will land in a
follow-up PR. The negative e2e test that asserted HTTP 400
OrderSimulationFailed comes back with it.
The enum was a near-1:1 copy of the underlying Result with the timeout
case folded in. Returning the Result directly drops the enum and the
single-caller run_order_simulation_only helper.
Base automatically changed from new-api-simulator-crate to main May 11, 2026 13:35
…new-api

# Conflicts:
#	crates/e2e/tests/e2e/quote_verification.rs
#	crates/orderbook/src/run.rs
#	crates/price-estimation/src/factory.rs
#	crates/price-estimation/src/trade_verifier.rs
#	crates/simulator/src/encoding.rs
#	crates/simulator/src/lib.rs
#	crates/simulator/src/simulation_builder.rs
@squadgazzz squadgazzz requested a review from MartinquaXD May 18, 2026 11:59
The post-merge fixes from the conflict resolution were on disk but
never staged. The merge commit itself compiled in isolation but missed
these four touchups needed because main renamed BalanceOverrides ->
StateOverrides (#4395) and dropped the code_fetcher parameter:

- simulator/src/lib.rs: re-add `pub mod order_simulation;` (this
  branch's module that didn't exist on main).
- simulator/src/simulation_builder.rs: re-add SettlementSimulator
  ::tenderly() getter used by order_simulation.rs.
- simulator/tests/aave_replay.rs: BalanceOverrides -> StateOverrides.
- shared/src/order_validation.rs: drop the stale MockCodeFetching arg
  from the OrderValidator test builder.
Copy link
Copy Markdown
Contributor

@jmg-duarte jmg-duarte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nits only

Comment thread crates/simulator/src/order_simulation.rs Outdated
Comment thread crates/simulator/src/order_simulation.rs Outdated
let buy_token_usdt = address!("dac17f958d2ee523a2206206994597c13d831ec7");
let sell_amount = U256::from_str("6133645").unwrap();
let buy_amount = U256::from_str("3388434912").unwrap();
let valid_to = 1_801_428_079u32; // 2027-01-31 ish
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next year we need to update this 😆

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No future update needed. The test deliberately bypasses OrderValidator (the only thing that bounds valid_to by SystemTime::now()) and drives SettlementSimulator directly. The eth_call is pinned to fork block 25028258 (mined 2026-05-05), so on-chain block.timestamp is the fork block's, not wall-clock time. valid_to = 2027-01-31 is in the future relative to the fork block, and the fork block's timestamp doesn't advance with wall-clock.

Comment thread crates/simulator/tests/aave_replay.rs Outdated
Comment thread crates/shared/src/order_validation.rs Outdated
Comment thread crates/configs/src/orderbook/mod.rs Outdated
solver: e2e::setup::onchain_components::TestAccount,
) -> Services<'a> {
let orderbook_config = Configuration {
order_simulation: Some(OrderSimulationConfig::test_default()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than introducing a single new test that sets this parameter wouldn't it be better to just enable the new shadow mode on all tests and not add a new dedicated test?
Ultimately the enforce variant should also be enabled universally so that seems fitting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! Updated.

Comment thread crates/e2e/tests/e2e/malformed_requests.rs
Comment thread crates/orderbook/src/run.rs Outdated
Comment thread crates/shared/src/order_validation.rs Outdated
Comment thread crates/shared/src/order_validation.rs Outdated
Comment thread crates/shared/src/order_validation.rs Outdated
Comment on lines +2986 to +2987
let mut sim = MockOrderSimulating::new();
sim.expect_simulate().times(0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is to use the new simulation logic for all order types going forward.
Even unsigned pre-sign orders (that are currently allowed) can be validated with the new logic by preparing the necessary state overrides.

Copy link
Copy Markdown
Contributor Author

@squadgazzz squadgazzz May 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok, makes sense. Updated the PR description and title as well. I thought we discussed that this would be used only for the 1271 orders, which is the minority.

Comment thread crates/shared/src/order_validation.rs Outdated
Comment thread crates/shared/src/order_creation_simulation.rs
Comment thread crates/shared/src/order_creation_simulation.rs
- Extract `Order::new(...)` builder chain into a `let order` binding in
  both `order_simulation.rs` and `aave_replay.rs` for readability.
- Replace the `match inputs.simulate().await` with a `let Err(err) = ...
  else { return Ok(()) }` early-return.
- Rename `timed_simulation` to `simulation_with_timeout` so the name
  describes the bound rather than instrumentation.
- Take `&str` instead of `String` for `full_app_data` in
  `OrderSimulating::simulate` and `simulation_with_timeout`.
- Replace the `match tokio::time::timeout` block with `.map_err()?`.
- Drop the redundant `try_from` for the test gas limit (`U256::from`).
- Rename the two confusing tests: `simulation_infra_error_is_fail_open`
  -> `simulation_infra_error_does_not_reject_order`, and
  `no_simulator_configured_returns_invalid_eip1271_signature_on_invalid_signature`
  -> `invalid_signature_rejected_when_simulator_disabled`.
Two related improvements requested in review:

1. The disagreement / infra-error warning now carries the signed order
   data, full appdata, signature, and owner. Re-simulating a shadow-mode
   warn line locally no longer requires going back to the database.
2. The malformed-appdata e2e test asserts again that the bad value
   appears in the response body. The orderbook already echoes the bad
   appdata in the error description, so the assertion was unnecessarily
   dropped when fields were added to satisfy the new required-field
   validation.
Two related refactors flagged in review.

The orderbook used to build its own `SettlementSimulator` for EIP-1271
shadow simulation while the price-estimation factory built another for
trade verification. Both ran the same on-chain queries
(`authenticator`, `vaultRelayer`, `domainSeparator`, `chain_id`) and
initialised their own balance-override detector against the same web3.

The factory now builds the simulator at construction time, stores it,
and exposes it via `settlement_simulator()`. The orderbook borrows that
instance, so the construction cost is paid once. `OrderSimulationConfig`
loses its `gas_limit` and `tenderly` fields, which were redundant with
the price-estimation values now in effect.

The free `simulation_with_timeout` helper becomes a method on
`OrderSimulator` so the call site reads
`config.simulate_with_timeout(...)` instead of dragging the struct
around as a `config:` argument.
The `simulator` crate now offers a flexible builder for arbitrary
simulation recipes. The orderbook's order-creation recipe (full fill,
fake solver, sufficient buy-token override, Tenderly only on revert)
is one specific use of that builder. Other consumers (e.g. the driver)
will want different recipes - notably without Tenderly, since the
driver does not need debuggability.

The trait, error, and adapter move from `simulator::order_simulation`
into a new `shared::order_creation_simulation` module that sits next
to its only consumer, `shared::order_validation`. The adapter is
renamed from `OrderSimulatorAdapter` to `OrderCreationSimulator` so
its purpose is obvious at the call site.
Shadow simulation is already wired into `Configuration::test_default()`,
so every e2e test that calls `Services::start_protocol` exercises the
simulator path. Existing Safe / EIP-1271 coverage in
`smart_contract_orders`, `eth_safe`, `ethflow`, `hooks`, and
`app_data_signer` already runs the same code path as the deleted test,
just with more realistic scenarios than a single happy-path Safe order.

Once enforce mode lands, flipping the same flag in `test_default()`
will broaden coverage uniformly without retrofitting any test.
`presign_orders()` on the builder writes the pre-signature flag for any
order whose signing scheme is `PreSign`, so the simulation passes the
on-chain check even when the user hasn't actually signed yet. Harmless
for the EIP-1271 orders that reach the recipe today (the builder no-ops
for non-presign schemes), and lets a future PR widen the simulation
trigger to PreSign orders without revisiting the recipe.
`OrderSimulationConfig` had only `timeout` left after the gas-limit and
Tenderly fields moved to price-estimation. Collapse the struct into a
direct `order_simulation_timeout: Option<Duration>` on `Configuration`.
`Some(timeout)` enables shadow simulation, `None` disables it.

Trim several over-explanatory doc / inline comments added during the
review cycle. Names already convey what; the rest was either describing
WHAT instead of WHY, referencing the future enforce-mode PR, or
narrating the simulator-instance-sharing rationale that's now obvious
from the call to `settlement_simulator()`.
The `order_simulation_timeout: Option<Duration>` toggle was redundant
with the price-estimation gate (`simulation_node_url`): when that node
URL is unset the factory has no `SettlementSimulator`, so the orderbook
cannot run a shadow simulation regardless. When the node URL is set,
operators almost always want both trade verification *and* order-
creation shadow simulation on. Replace the optional with a `Duration`
defaulted to 2s. Shadow simulation now runs whenever the factory
exposes a simulator.
The simulator was previously gated to EIP-1271 orders only, with
non-EIP-1271 orders short-circuiting in `calculate_verification_gas_limit`.
That made the recipe's `.presign_orders()` call dead code: PreSign
orders never reached the simulator. EOA orders (Eip712, EthSign) also
bypassed observability.

`calculate_verification_gas_limit` now always invokes the simulator
(when one is configured) in shadow mode. EIP-1271 orders additionally
run the on-chain `isValidSignature` check as before. The
`run_shadow_simulation` helper is now shared between the
`eip1271_skip_creation_validation` branch and the non-EIP-1271 branch.
@squadgazzz squadgazzz changed the title Simulate EIP-1271 orders at creation Shadow-mode order simulation at creation May 19, 2026
Shadow mode is the only mode the simulator runs in today. The suffix
adds nothing and is inconsistent with the other tests in the file.
Shadow mode is the only mode the simulator runs in. Rename
`run_shadow_simulation` to `observe_simulation`, and drop "shadow"
from doc comments where it didn't add information. The PR description
still uses the term to explain the feature to reviewers.
Two regressions surfaced by CI:

1. `configs::orderbook::tests::deserialize_full_configuration`:
   `order-simulation-timeout` was placed after a `[table]` header in
   the test TOML, so it ended up inside that table instead of at the
   root. Move it above the table headers.

2. `e2e::malformed_requests::local_node_simulation_not_enabled`:
   the test was asserting `405 Method Not Allowed` from the debug
   simulation endpoint when the operator toggle was off. The toggle
   no longer exists, and the e2e harness always sets
   `simulation_node_url`, so the simulator is always wired up in
   e2e runs. The 405 code path lives only in the `simulation_node_url
   == None` branch which the harness can't produce. Drop the test.
@squadgazzz squadgazzz requested a review from MartinquaXD May 19, 2026 12:07
Adds .times(1) to the simulator mock in signature_and_simulation_outcome_matrix.
The matrix test exercised the EIP-1271 path but didn't pin the simulator
invocation count, so a future regression that bypassed the simulator
would have passed the test silently. The non-1271 test already has the
explicit .times(1).
Copy link
Copy Markdown
Contributor

@MartinquaXD MartinquaXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments. I think this is okay for now. Hopefully there we can eliminate any edge cases we might find such that we can completely remove the old verification logic. 🤞

@squadgazzz squadgazzz added this pull request to the merge queue May 19, 2026
Merged via the queue into main with commit dcbc69d May 19, 2026
20 checks passed
@squadgazzz squadgazzz deleted the prototype-eip1271-on-new-api branch May 19, 2026 17:36
@github-actions github-actions Bot locked and limited conversation to collaborators May 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants