A cheaply-verifiable rule for "buggy reduction", generalized to optimization
Builds on #14 ("identify a buggy rule without solving the problem exactly"). This proposes a concrete rule and the verifier changes it implies.
Design invariant (the requirement from #14)
A bug certificate must be checkable with a constant number of reduce/extract/embed/evaluate calls on the given configurations. No pred solve.
A bug is an existential object, so it should have a short certificate. We only fall back to solving when no short certificate can exist (see "irreducible case" below).
Where the current verifier stands (benchmark/verify.py)
| Violation |
Verdict |
Problem |
unsound_extraction (verify.py:156) |
cheap, ~sound |
Evaluates the AI's claimed_source_solution but never runs pred extract to confirm extraction actually returns it (lines 195–199). Contract is extract(B,c_B)==c_A; verifier should recompute, not trust. |
incomplete_reduction (verify.py:224) |
solves the hard problem |
Brute-force-solves both source (line 229) and target (line 247). Direct violation of the invariant. |
suboptimal_extraction (verify.py:280) |
cheap but unsound |
Declares a bug when any source config beats extract(target_config) (line 332), but never checks target_config is target-optimal. A correct reduction maps optimal→optimal and is free to map suboptimal→suboptimal. False positives. Making it sound in this framing requires proving target_config optimal ⇒ solving. |
Root cause (same as the original SAT rule's "c_B unsat ∧ extract(c_B) sat" disjunct): the reduction exposes only the backward map extract, so completeness/optimality — which are statements about the forward direction — get certified with data extract has no contract over.
Proposed rule
Add a forward solution map alongside the existing backward one:
reduce : A ↦ B (instance map — exists: `pred reduce`)
extract: (B, c_B) ↦ c_A (backward solution map — exists: `pred extract`)
embed : (A, c_A) ↦ c_B (forward solution map — `pred embed`, if not already exposed)
Decision problems. Correct iff (B) every c_B satisfying B has extract(c_B) satisfying A, and (F) every c_A satisfying A has embed(c_A) satisfying B. Together ⇒ YES_A ⟺ YES_B. A bug is one witness violating either. (F) is the cheap, sound completeness certificate that replaces the broken disjunct.
Optimization problems. Each problem has an objective (pred evaluate already returns Max(v)/Min(v)/…(None)); the reduction declares an affine cost relation (a, b), a ≠ 0 (sign flips min↔max). Contract, as two inequalities (WLOG min):
forward : obj_B(embed(c_A)) ≤ a·obj_A(c_A) + b
backward: obj_A(extract(c_B)) ≤ (obj_B(c_B) − b) / a
These two inequalities pin OPT_B = a·OPT_A + b and force optima↦optima without knowing OPT:
take optimal c_A* ⇒ embed(c_A*) feasible at cost ≤ a·OPT_A+b ⇒ OPT_B ≤ a·OPT_A+b;
take optimal c_B* ⇒ extract(c_B*) feasible at cost ≤ (OPT_B−b)/a ⇒ a·OPT_A+b ≤ OPT_B. Equality, and extract(c_B*) attains OPT_A. ∎
Four cheap, sound bug certificates (no solver)
| Certificate |
Witness |
Catches |
Replaces |
| B-infeasible |
c_B feasible, extract(c_B) infeasible |
garbage extraction |
today's unsound_extraction |
| B-cost |
c_B feasible, obj_A(extract(c_B)) > (obj_B(c_B)−b)/a |
extraction loses objective value |
fixes suboptimal_extraction |
| F-infeasible |
c_A feasible, embed(c_A) infeasible |
reduction drops solutions |
cheap incomplete_reduction |
| F-cost |
c_A feasible, obj_B(embed(c_A)) > a·obj_A(c_A)+b |
reduction's optimum unreachable |
— |
Decision is the special case where "feasible" is the objective: the two *-infeasible rows survive, the cost rows vanish. One unified rule.
Key fix for suboptimality: B-cost is per-solution, not per-optimum — compare extract(c_B) against the value predicted from c_B's own score via (a,b), so c_B need not be optimal and nothing is solved. Contract-free variant if we don't want to trust a declared (a,b): an order-reversal pair — feasible c_B1, c_B2 with obj_B(c_B1) < obj_B(c_B2) but obj_A(extract(c_B1)) > obj_A(extract(c_B2)) — proves "solve B then extract" doesn't solve A, no (a,b) and no solver.
The one irreducible case
"A is feasible but B is not" (pure completeness) has no short certificate without embed — that's why incomplete_reduction solves today. embed(c_A) sidesteps it by exhibiting the target solution that should exist instead of proving its existence by search. Hence the fork:
- Add
pred embed upstream (CodingThrust/problem-reductions) → all four certificates are cheap + sound; the framework becomes a clean optimum-preserving-reduction checker. (Needs an upstream issue.)
- Don't add it → keep
incomplete_reduction as an explicit "expensive, last-resort" check gated behind an instance-size cap; still replace suboptimal_extraction with the cheap+sound B-cost / order-reversal check (no forward map needed). Removes one of the two solves and fixes the unsoundness.
Is a forward embed generally available?
Yes for essentially every reduction this library targets — embed is the constructive content of the completeness proof. "YES_A ⟹ YES_B" (and its optimization analog OPT_B = a·OPT_A+b) is proven by building a target solution from a source one; that construction is embed. A correct reduction cannot lack one in principle. Caveats are about implementation/cost, not existence:
- Usually unimplemented, not unavailable. Using a reduction needs only
extract, so embed is left in the proof, not the code. The upstream ask is "expose the forward map the proof already defines."
- Gadget / spin-glass / QUBO reductions stay cheap. Given the source assignment, each gadget's auxiliary variables are set by a local, per-gadget choice/ground state — polynomial, not a global solve. (Confirm per rule that no step hides a global optimization.)
- Non-parsimonious reductions (one source ↔ many target solutions) need only a canonical choice:
embed returns one valid target config, not all.
embed need not invert extract. The certificates only require each map to land feasible at the right cost; extract(embed(c_A)) = c_A is a bonus parsimony check, not a requirement.
- Composes along
path: for A→C→B, embed_total = embed_{C→B} ∘ embed_{A→C}, so a chain has embed iff each atomic step does.
Consequence: the rule degrades gracefully — even where a rule can't expose a cheap embed, only the completeness certificate falls back to the gated solve; the backward certificates (B-infeasible, B-cost, order-reversal) stay solver-free.
Proposed work (this repo)
Acceptance criteria
- Every accepted certificate type is verifiable with O(1)
pred calls and zero pred solve (except the explicitly-gated completeness fallback, if embed is not added).
- A calibration fixture demonstrating the current
suboptimal_extraction false positive (suboptimal target config extracting to a suboptimal source solution) is rejected by the new check.
--calibrate passes with the new fixtures.
cc @isPANN (#14)
A cheaply-verifiable rule for "buggy reduction", generalized to optimization
Builds on #14 ("identify a buggy rule without solving the problem exactly"). This proposes a concrete rule and the verifier changes it implies.
Design invariant (the requirement from #14)
A bug is an existential object, so it should have a short certificate. We only fall back to solving when no short certificate can exist (see "irreducible case" below).
Where the current verifier stands (
benchmark/verify.py)unsound_extraction(verify.py:156)claimed_source_solutionbut never runspred extractto confirm extraction actually returns it (lines 195–199). Contract isextract(B,c_B)==c_A; verifier should recompute, not trust.incomplete_reduction(verify.py:224)suboptimal_extraction(verify.py:280)extract(target_config)(line 332), but never checkstarget_configis target-optimal. A correct reduction maps optimal→optimal and is free to map suboptimal→suboptimal. False positives. Making it sound in this framing requires provingtarget_configoptimal ⇒ solving.Root cause (same as the original SAT rule's "
c_Bunsat ∧extract(c_B)sat" disjunct): the reduction exposes only the backward mapextract, so completeness/optimality — which are statements about the forward direction — get certified with dataextracthas no contract over.Proposed rule
Add a forward solution map alongside the existing backward one:
Decision problems. Correct iff (B) every
c_BsatisfyingBhasextract(c_B)satisfyingA, and (F) everyc_AsatisfyingAhasembed(c_A)satisfyingB. Together ⇒YES_A ⟺ YES_B. A bug is one witness violating either. (F) is the cheap, sound completeness certificate that replaces the broken disjunct.Optimization problems. Each problem has an objective (
pred evaluatealready returnsMax(v)/Min(v)/…(None)); the reduction declares an affine cost relation(a, b),a ≠ 0(sign flips min↔max). Contract, as two inequalities (WLOG min):These two inequalities pin
OPT_B = a·OPT_A + band force optima↦optima without knowing OPT:take optimal
c_A*⇒embed(c_A*)feasible at cost ≤a·OPT_A+b⇒OPT_B ≤ a·OPT_A+b;take optimal
c_B*⇒extract(c_B*)feasible at cost ≤(OPT_B−b)/a⇒a·OPT_A+b ≤ OPT_B. Equality, andextract(c_B*)attainsOPT_A. ∎Four cheap, sound bug certificates (no solver)
c_Bfeasible,extract(c_B)infeasibleunsound_extractionc_Bfeasible,obj_A(extract(c_B)) > (obj_B(c_B)−b)/asuboptimal_extractionc_Afeasible,embed(c_A)infeasibleincomplete_reductionc_Afeasible,obj_B(embed(c_A)) > a·obj_A(c_A)+bDecision is the special case where "feasible" is the objective: the two
*-infeasiblerows survive, the cost rows vanish. One unified rule.Key fix for suboptimality: B-cost is per-solution, not per-optimum — compare
extract(c_B)against the value predicted fromc_B's own score via(a,b), soc_Bneed not be optimal and nothing is solved. Contract-free variant if we don't want to trust a declared(a,b): an order-reversal pair — feasiblec_B1, c_B2withobj_B(c_B1) < obj_B(c_B2)butobj_A(extract(c_B1)) > obj_A(extract(c_B2))— proves "solve B then extract" doesn't solve A, no(a,b)and no solver.The one irreducible case
"A is feasible but B is not" (pure completeness) has no short certificate without
embed— that's whyincomplete_reductionsolves today.embed(c_A)sidesteps it by exhibiting the target solution that should exist instead of proving its existence by search. Hence the fork:pred embedupstream (CodingThrust/problem-reductions) → all four certificates are cheap + sound; the framework becomes a clean optimum-preserving-reduction checker. (Needs an upstream issue.)incomplete_reductionas an explicit "expensive, last-resort" check gated behind an instance-size cap; still replacesuboptimal_extractionwith the cheap+sound B-cost / order-reversal check (no forward map needed). Removes one of the two solves and fixes the unsoundness.Is a forward
embedgenerally available?Yes for essentially every reduction this library targets —
embedis the constructive content of the completeness proof. "YES_A ⟹ YES_B" (and its optimization analogOPT_B = a·OPT_A+b) is proven by building a target solution from a source one; that construction isembed. A correct reduction cannot lack one in principle. Caveats are about implementation/cost, not existence:extract, soembedis left in the proof, not the code. The upstream ask is "expose the forward map the proof already defines."embedreturns one valid target config, not all.embedneed not invertextract. The certificates only require each map to land feasible at the right cost;extract(embed(c_A)) = c_Ais a bonus parsimony check, not a requirement.path: for A→C→B,embed_total = embed_{C→B} ∘ embed_{A→C}, so a chain hasembediff each atomic step does.Consequence: the rule degrades gracefully — even where a rule can't expose a cheap
embed, only the completeness certificate falls back to the gated solve; the backward certificates (B-infeasible, B-cost, order-reversal) stay solver-free.Proposed work (this repo)
unsound_extraction: recomputepred extractontarget_configand verify its output is infeasible (don't trustclaimed_source_solution).suboptimal_extractionwith B-cost (declared(a,b)) and/or order-reversal (two target configs) — both solver-free.pred embedexists; until then, gateincomplete_reductionbehind a size cap and label it expensive.results.schema.jsonviolation enum + required fields per certificate; add fixtures + calibration cases (one true-positive and one false-positive per type).Acceptance criteria
predcalls and zeropred solve(except the explicitly-gated completeness fallback, ifembedis not added).suboptimal_extractionfalse positive (suboptimal target config extracting to a suboptimal source solution) is rejected by the new check.--calibratepasses with the new fixtures.cc @isPANN (#14)