fix(inline): terminate fixpoint when no inline survives verification — fixes #147 i64 hang (v1.1.3)#149
Merged
Merged
Conversation
…loses #147) gale repro (#147): `loom optimize --passes inline` hung indefinitely on a ~5 KB i64 module (z_impl_k_sem_give u64-unpack + caller) on v1.1.2 — 0 panics, no completion. Root cause: NOT a Z3 hang. The inline pass loops "until no inline candidate remains", but a candidate whose inline the Z3 verifier reverts (an i64 function whose inline-equivalence can't be proven) stays a candidate every iteration — its call site is restored on revert — so the pass livelocks: inline → verify reverts → inline → … forever. Verify itself returns fast (confirmed: the run spews "counterexample / reverting" rapidly, not a single slow solve, and the pass completes instantly under LOOM_Z3_MAX_INSTRUCTIONS=0). It only surfaced now because the #145 fix removed the SortDiffers panic that previously aborted the pass; with the panic gone the underlying livelock was exposed. Fix (loom-core/src/lib.rs, inline_functions): - Track whether each iteration KEEPS any inline (func body changed AND not reverted by verify_or_revert). An iteration that reverts everything made no progress -> break instead of re-spinning. - Hard cap of 64 iterations as a backstop. Legitimate inlining is unaffected (a kept inline sets kept_any -> the loop continues exactly as before). Also reverts the verify() watchdog-thread experiment from the investigation (the hang was never in verify); the Z3 solve-time bound in z3_config_with_timeout stays. Verified locally: the gale repro now completes (gtimeout exit 0, "Optimization complete", output passes wasm-tools validate) — the unprovable i64 function is reverted once, then the pass terminates. Closes #147 Trace: REQ-6
Bumps version 1.1.2 -> 1.1.3 and re-enables the four no-panic i64 inline regression tests that the #147 livelock fix unblocks (they now run in ms; the one asserting i64 IS inlined stays #[ignore] — the verifier soundly reverts the unprovable i64 inline, a #147 follow-up). The earlier "i64 Z3 verification is slow" diagnosis was wrong: the 56-min / 4h CI matrices were the inline-pass livelock spinning, not slow solves. With the livelock fixed the i64 inline tests complete in milliseconds and the Test matrix should run normally again. Closes #147. Trace: REQ-6
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #147: on v1.1.2,
loom optimize --passes inlinehung indefinitely on a small i64 module (gale-ffiz_impl_k_sem_giveu64-unpack + caller) — 0 panics, no completion. This is the last loom-side gate for gale's verified cross-language-LTO route.Root cause — a livelock, not a slow solve
inline_functionsloops "until no inline candidate remains." A candidate whose inline the Z3 verifier reverts (an i64 function whose inline-equivalence can't be proven) stays a candidate every iteration — its call site is restored on revert — so the pass live-locks: inline → verify reverts → inline → … forever. Verify itself returns in microseconds (the run spewscounterexample / revertingrapidly, and the pass completes instantly underLOOM_Z3_MAX_INSTRUCTIONS=0). It surfaced only because v1.1.2's #145 fix removed theSortDifferspanic that used to abort the pass.This corrects the earlier diagnosis — the 56-min / 4h CI matrices were this livelock spinning, not slow Z3.
Fix
inline_functionsterminates when an iteration keeps no inline (made no net progress), tracked via a per-iterationkept_anyflag (body changed AND not reverted). 64-iteration hard cap as a backstop. Legitimate inlining is unchanged (a kept inline → loop continues exactly as before).#[ignore]'d i64 inline tests (they now run in ms). The 5th asserts the i64 helper is inlined, which the verifier soundly reverts as unprovable — real verified i64 inlining stays a Add a Z3 per-query timeout to the translation validator (bounds i64 verification; re-enable ignored i64 inline tests) #147 follow-up.Verified locally
Inline test suite: 4 i64 tests pass in 0.02s; no regression on non-i64 inline tests.
Falsification
Wrong if
loom optimize --passes inlinestill fails to terminate on an i64 module, or if thekept_anyguard breaks a legitimate multi-pass inline (a kept inline must still let the loop continue). The gale reporter offered to re-test on landing.Closes #147