Solve-allowed verification: one round-trip check (solve_mismatch) that subsumes all violation types

## Solve-allowed verification: one round-trip check that subsumes all three violation types

Complement to #15. Where #15 keeps certificates solver-free for large instances, this issue adds the opposite (and much simpler) mode: **on small instances, allow `pred solve`, and reduction correctness collapses to a single equality.** Builds on #14.

### The observation

`pred solve <bundle>` already performs the whole round-trip — it brute-force-solves the *target*, then extracts back to *source space*. The outer `evaluation`/`solution` is the source-space answer; `intermediate` holds the target solution. Example (MIS `0-1,1-2` → MaximumClique):

```
$ pred solve source.json  --solver brute-force --json   →  {"evaluation":"Max(2)", "solution":[1,0,1], ...}
$ pred solve bundle.json   --solver brute-force --json   →  {"evaluation":"Max(2)", "solution":[1,0,1],
                                                              "reduced_to":"MaximumClique",
                                                              "intermediate":{"evaluation":"Max(2)","solution":[1,0,1]}}
```

### The rule

> **A reduction is correct on instance A iff `value(solve(A)) == value(solve(reduce(A)))`** — the true optimum of A equals the optimum recovered by solving the target and extracting. A **bug** is any small A where they disagree (with `None`/`false` ⇒ infeasible, numeric compared within `FLOAT_TOLERANCE`).

This single equality catches all three existing failure modes at once:
- **incomplete_reduction:** A feasible, `solve(bundle)` infeasible → `Max(k)` vs `Max(None)`.
- **unsound_extraction:** extracted optimum infeasible for A → mismatch.
- **suboptimal_extraction:** extracted value worse than true optimum → `Max(1)` vs `Max(2)`. **And it fixes the current soundness hole** (`verify.py:280`) because `solve(bundle)` returns the *verified* target optimum, instead of trusting an arbitrary `target_config`.

### Why the check is much easier

- **Minimal certificate.** The AI only supplies a small source instance + target type: `{rule, violation: "solve_mismatch", source, bundle}`. No `target_config`, no `claimed_source_solution`, no `brute_force_solution`, no declared cost constants, no forward map.
- **Zero trust in AI configs.** The verifier solves both sides itself; nothing the AI claims about solutions can spoof it.
- **One comparison.** `solve(source).evaluation` vs `solve(bundle).evaluation`.

### Cost / scope (why this is *not* a replacement for #15)

- Bounded to **brute-force-solvable instances**. Gate: `pred solve` runs under the existing 30s subprocess timeout; on timeout the cert is rejected ("instance too large for solve-based verification"). Reduction bugs almost always reproduce on tiny instances (gadget bugs show on small graphs), so this is pragmatic.
- Uses the **one** optimal target solution the solver returns; a reduction buggy only on a *different* optimum can be missed. #15's explicit-config certificates target specific bad solutions. The two modes are complementary.

### Proposed work (this repo)

- [ ] Add `solve_mismatch` to the `violation` enum in `results.schema.json` (no new required fields — `rule, violation, source, bundle` suffices).
- [ ] `_run_pred`: catch `subprocess.TimeoutExpired` → return a timeout rc so oversized instances are rejected cleanly (also hardens existing `incomplete`/`suboptimal` solve paths).
- [ ] Implement `_check_solve_mismatch(cert, source_file, bundle_file)`: two solves + evaluation comparison; accept on disagreement, reject on agreement, reject on timeout.
- [ ] Route `"solve_mismatch"` in `_verify_in`.
- [ ] Tests: a correct reduction (MIS→MaximumClique) under `solve_mismatch` must be **rejected** ("recovers the true optimum"); add a calibration fixture for it. (A true-positive needs an actually-buggy rule, same limitation noted for `incomplete_reduction` tests.)

### Acceptance criteria

- `verify` accepts a `solve_mismatch` cert iff `solve(source)` and `solve(bundle)` evaluations disagree.
- MIS→MaximumClique (a correct reduction) under `solve_mismatch` is **rejected** with a reason mentioning the recovered optimum equals the true optimum.
- An oversized source instance (solve exceeds the timeout) is **rejected**, not crashed.
- `--calibrate` passes including the new fixture.

Relates to #14, complements #15.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Solve-allowed verification: one round-trip check (solve_mismatch) that subsumes all violation types #16

Solve-allowed verification: one round-trip check that subsumes all three violation types

The observation

The rule

Why the check is much easier

Cost / scope (why this is not a replacement for #15)

Proposed work (this repo)

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Solve-allowed verification: one round-trip check (solve_mismatch) that subsumes all violation types #16

Description

Solve-allowed verification: one round-trip check that subsumes all three violation types

The observation

The rule

Why the check is much easier

Cost / scope (why this is not a replacement for #15)

Proposed work (this repo)

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions