Skip to content

Solve-allowed verification: one round-trip check (solve_mismatch) that subsumes all violation types #16

Description

@GiggleLiu

Solve-allowed verification: one round-trip check that subsumes all three violation types

Complement to #15. Where #15 keeps certificates solver-free for large instances, this issue adds the opposite (and much simpler) mode: on small instances, allow pred solve, and reduction correctness collapses to a single equality. Builds on #14.

The observation

pred solve <bundle> already performs the whole round-trip — it brute-force-solves the target, then extracts back to source space. The outer evaluation/solution is the source-space answer; intermediate holds the target solution. Example (MIS 0-1,1-2 → MaximumClique):

$ pred solve source.json  --solver brute-force --json   →  {"evaluation":"Max(2)", "solution":[1,0,1], ...}
$ pred solve bundle.json   --solver brute-force --json   →  {"evaluation":"Max(2)", "solution":[1,0,1],
                                                              "reduced_to":"MaximumClique",
                                                              "intermediate":{"evaluation":"Max(2)","solution":[1,0,1]}}

The rule

A reduction is correct on instance A iff value(solve(A)) == value(solve(reduce(A))) — the true optimum of A equals the optimum recovered by solving the target and extracting. A bug is any small A where they disagree (with None/false ⇒ infeasible, numeric compared within FLOAT_TOLERANCE).

This single equality catches all three existing failure modes at once:

  • incomplete_reduction: A feasible, solve(bundle) infeasible → Max(k) vs Max(None).
  • unsound_extraction: extracted optimum infeasible for A → mismatch.
  • suboptimal_extraction: extracted value worse than true optimum → Max(1) vs Max(2). And it fixes the current soundness hole (verify.py:280) because solve(bundle) returns the verified target optimum, instead of trusting an arbitrary target_config.

Why the check is much easier

  • Minimal certificate. The AI only supplies a small source instance + target type: {rule, violation: "solve_mismatch", source, bundle}. No target_config, no claimed_source_solution, no brute_force_solution, no declared cost constants, no forward map.
  • Zero trust in AI configs. The verifier solves both sides itself; nothing the AI claims about solutions can spoof it.
  • One comparison. solve(source).evaluation vs solve(bundle).evaluation.

Cost / scope (why this is not a replacement for #15)

  • Bounded to brute-force-solvable instances. Gate: pred solve runs under the existing 30s subprocess timeout; on timeout the cert is rejected ("instance too large for solve-based verification"). Reduction bugs almost always reproduce on tiny instances (gadget bugs show on small graphs), so this is pragmatic.
  • Uses the one optimal target solution the solver returns; a reduction buggy only on a different optimum can be missed. Cheaply-verifiable rule for a buggy reduction, generalized to optimization (no solving) #15's explicit-config certificates target specific bad solutions. The two modes are complementary.

Proposed work (this repo)

  • Add solve_mismatch to the violation enum in results.schema.json (no new required fields — rule, violation, source, bundle suffices).
  • _run_pred: catch subprocess.TimeoutExpired → return a timeout rc so oversized instances are rejected cleanly (also hardens existing incomplete/suboptimal solve paths).
  • Implement _check_solve_mismatch(cert, source_file, bundle_file): two solves + evaluation comparison; accept on disagreement, reject on agreement, reject on timeout.
  • Route "solve_mismatch" in _verify_in.
  • Tests: a correct reduction (MIS→MaximumClique) under solve_mismatch must be rejected ("recovers the true optimum"); add a calibration fixture for it. (A true-positive needs an actually-buggy rule, same limitation noted for incomplete_reduction tests.)

Acceptance criteria

  • verify accepts a solve_mismatch cert iff solve(source) and solve(bundle) evaluations disagree.
  • MIS→MaximumClique (a correct reduction) under solve_mismatch is rejected with a reason mentioning the recovered optimum equals the true optimum.
  • An oversized source instance (solve exceeds the timeout) is rejected, not crashed.
  • --calibrate passes including the new fixture.

Relates to #14, complements #15.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions