perf(vm): compile bitwise ops and set_list multi-return tails by davydog187 · Pull Request #347 · tv-labs/lua

davydog187 · 2026-06-10T21:32:16Z

What changed and why

Closes dispatcher coverage gaps by compiling the bitwise opcodes (band/bor/bxor/shl/shr) and the set_list multi-return tail directly in the VM dispatcher, so these paths no longer fall back to the interpreter. Motivated by an external performance review flagging these as the remaining uncompiled dispatcher cases.

Benchmark delta

Scenario: tight bitwise loop (200k-iteration numeric for doing band/bor/bxor/shl/shr per iter) and a forward-goto loop (200k-iteration numeric for with a forward goto/label dispatched every iteration). Note: backward-goto loops are NOT supported by this VM's executor — find_label only scans forward in the remaining instruction stream — so a goto-driven loop is not expressible; goto.lua is skipped in the suite.

Baseline (origin/main @ 016fa35, clean rerun): bitwise 97.94 ms avg / 10.21 ips (±3.82%); goto 18.77 ms avg / 53.26 ips (±2.17%). Memory: bitwise 1022.9 MB, goto 195.0 MB.
Branch (perf/dispatcher-coverage @ 968296c, clean rerun): bitwise 96.77 ms avg / 10.33 ips (±3.65%); goto 19.48 ms avg / 51.33 ips (±2.06%). Memory: bitwise 1022.9 MB, goto 195.0 MB (identical to baseline).

Headline delta: bitwise -1.2% avg ms (ips +1.2%); goto +3.8% avg ms (ips -3.6%).

Verdict: INCONCLUSIVE / noise-dominated. Both deltas are smaller than the per-run deviation (±2-4%) and smaller than the baseline's own run-to-run swing (bitwise 102.45 -> 97.94 ms across its two runs). No measurable speedup or regression on these two micro-scenarios at n=200k; memory usage is identical on both refs. The -1.2% should not be read as a real gain. A true goto-driven loop could not be benchmarked because the executor does not support backward gotos.

Measured via benchmarks/dispatcher_coverage.exs (left untracked in the worktree, not committed): loads two pre-compiled chunks via Lua.load_chunk!/2 and times Lua.eval!/2 on each with Benchee. Ran MIX_ENV=benchmark LUA_BENCH_MODE=full mix run benchmarks/dispatcher_coverage.exs (2s warmup, 10s time, 1s memory_time), two runs each on 016fa35 and 968296c.

Review verdict

External review state: ready, tests pass. Two nits, both verified non-defects:

Dispatcher re-declares @op_* constants 53-59 by hand rather than referencing Bytecode's accessor functions; parity enforced by comment + tests. Verified tags resolve to 53..59 in both files and consistent with the established mirroring pattern — latent duplication-drift risk only.
The band/bor/bxor integer fast path skips to_signed_int64 that the interpreter always applies. Correct because bitwise combination of two in-range int64s stays in range (no-op narrow); relies on the VM-wide invariant that registers only hold narrowed integers — same invariant the interpreter's other fast paths assume. Verified for int64-boundary AND and 0xFFFF...FFFF masking.

Merge gating

Draft — human-gated merge. Do not merge automatically.

Stabilization + independent verification

Independently re-verified on a clean worktree of perf/dispatcher-coverage @ 968296c against origin/main @ 016fa35.

Corrected headline metric (replaces the inconclusive top-level A/B)

The PR-body baseline-vs-branch comparison measures noise: origin/main deopts any bitwise-using prototype to the interpreter, so both refs ran the interpreter and the delta was jitter. The change's real effect is the compiled dispatcher path vs the interpreter for the same bitwise code, measured with the bytecode-strip harness (benchmarks/dispatcher_vs_interpreter.exs mechanism: recursively proto.bytecode = nil, assert :compiled_closure vs :lua_closure, then time).

	dispatcher (compiled)	interpreter (stripped)	delta
time (avg)	52.29 ms (±3.86%)	70.38 ms (±1.27%)	1.35x faster
memory	587 MB	682 MB	1.16x less

(MIX_ENV=benchmark LUA_BENCH_MODE=full, 2s warmup / 10s time / 1s memory_time; 200k-iteration band/bor/bxor/shl/shr loop, Apple M4.)

Honest caveat retained: a whole-program A/B at eval! level shows no delta because the rest of the program already compiled on both refs.

Scope clarification

goto/label are NOT compiled by this PR — they remain on the interpreter by design (encoder routes them to :fallback; a bitwise/goto function on the branch resolves to :compiled_closure / :lua_closure respectively). There is no dispatcher goto path; the PR covers bitwise + set_list multi-return tail only.
The reintroduced @reg_slack 16 buffer + grow_regs/pad_nils are required for the set_list_multi tail ({x, f()}), which writes registers past codegen's reported peak.

Correctness checks

set_list multi shapes ({f()}, {1,2,f()}, {f(),99}) produce correct expansions/truncations.
int64-boundary band/bor/bxor return correct values (fast-path to_signed_int64 skip is sound).
Pre-existing nit (not introduced here): dispatcher bitwise runtime errors report <eval>:0 vs interpreter's real line; line digit is the only observable divergence, consistent with arithmetic dispatcher_binop.

Test status

Full suite green: 2262 passed, 19 skipped, 0 failures. lua53 suite (incl. bitwise.lua, goto.lua), bitwise_test, dispatcher_test, bytecode_test all pass. No regressions.

The bytecode encoder bailed the entire prototype to the interpreter on the first bitwise op or `{:multi, _}` set_list tail, so a single `n & 1` or a `{x, f()}` constructor deoptimized a whole function. Add dispatcher + encoder coverage for: - band/bor/bxor (inline two-integer fast path, bridge to a new Executor.dispatcher_bitwise/7 helper for coercion/metamethods) - shl/shr/bnot (always bridge; shift masking lives in lua_shift_*) - set_list with a {:multi, init_count} tail (folds init_count with state.multi_return_count, mirroring executor.ex) Once those prototypes compile, two latent dispatcher bugs surfaced and are fixed here so the compiled path matches the interpreter: - init_regs/init_callee_regs now carry the same +16 slack buffer the interpreter uses; codegen undercounts the scratch registers a multi-return constructor tail writes through. - return_one grows the register tuple before writing a single value into a multi-return expansion dest, mirroring ensure_regs_capacity. goto/label stay on the interpreter (label-resolution semantics not yet ported); they remain the encoder's fallback trigger in tests. Plan: E

davydog187 · 2026-06-11T10:57:14Z

🤖 Ultracode review — round 1

[MAJOR] test/lua/compiler/max_registers_invariant_test.exs:32,206 — New opcodes 53-59 (bitwise + set_list_multi) escape the max_registers invariant test entirely; 6 op_* accessors are dead as a result

The PR adds tags 53-59 (op_bitwise_and..op_bitwise_not, op_set_list_multi) but does not touch max_registers_invariant_test.exs. None of the 7 new opcodes appear in register_positions/1 (the catch-all raise at line 92 only fires if a corpus program compiles the opcode), and @corpus (lines 206-245) contains no bitwise op and no {x, f()} / {...} constructor tail — so the catch-all never fires and the new opcodes are never bounds-checked. This matters most for the bitwise ops: their dispatcher clauses write setelement(dest+1, ...) with no grow_regs (only the multi-return expansion and set_list_multi paths grow), so a codegen max_registers undercount for a bitwise dest beyond the new @reg_slack=16 buffer surfaces as a runtime :badarg, not in CI. Confirmed empirically that build()'s compiled bytecode contains op 59 yet no corpus fixture reaches it. Reinforcing facts: (1) the 6 bitwise op_* accessors (op_bitwise_and/0..op_bitwise_not/0) are referenced nowhere in lib or test — only op_set_list_multi/0 is used (bytecode_test.exs:207) — and the convention is that these accessors are consumed by exactly this invariant test; (2) the test's moduledoc (lines 6-8) still asserts the dispatcher sizes its register tuple "exactly to max_registers (no +16 safety buffer like the interpreter)", which this PR directly contradicts by adding @reg_slack=16 to init_regs/init_callee_regs in dispatcher.ex. The premise the invariant test documents is now stale.

Suggestion: Add register_positions/1 cases for the new opcodes (op_bitwise_and/op_bitwise_or/op_bitwise_xor/op_shift_left/op_shift_right -> [1,2,3]; op_bitwise_not -> [1,2]; a :set_list_multi clause matching the table/start operands), add @corpus entries that compile a bitwise op and both constructor-tail shapes ({x, f()} and {...}), and update the moduledoc now that the dispatcher carries a +16 slack buffer. This also removes the dead-accessor condition for the 6 bitwise op_* functions in one change.
[MINOR] test/lua/vm/dispatcher_test.exs — set_list multi-return test asserts on the wrong prototype and skips the init_count==0 (vararg) producer

In the set_list multi-return test, first_sub/1 returns prototypes |> hd, which is pair() — but op_set_list_multi (op 59) lives in proto.prototypes[1] = build(). Confirmed directly: proto[0]=pair has no set_list_multi, proto[1]=build does. So assert first_sub(proto).bytecode only confirms pair() compiled, not the function under test; the "compiled?" guard is misdirected (the result assertion [1,10,20] does still validate dispatcher output). Separately, codegen routes three shapes to op_set_list_multi: the call tail {a, f()} (init_count>=1, codegen.ex:1309), the {a, ...} vararg tail (init_count>=1, codegen.ex:1270), and the bare {...} vararg tail (init_count==0, codegen.ex:1277). Only the call shape with init_count==1 is tested; the vararg-fed shapes — a different producer (:vararg) feeding state.multi_return_count into the same new opcode — and the init_count==0 path have no compiled-path test.

Suggestion: Assert on the prototype that actually owns op_set_list_multi (e.g. Enum.at(proto.prototypes, 1).bytecode, or scan the bytecode tuple for Bytecode.op_set_list_multi()). Add a vararg-tail case such as function build(...) local t = {1, ...} return t[1], t[2], t[3] end invoked with extra args to cover the init_count==0 producer path.
[MINOR] test/lua/vm/dispatcher_test.exs — Bitwise error/coercion edge cases and non-__band metamethod bridges are unpinned in the compiled-path test

The bitwise describe block covers integer fast paths, integer-valued float coercion (6.0 & 3.0), an int64-boundary AND, negative/large shifts, and exactly one metamethod (__band). The edge cases the PR's RISK statement explicitly names are not pinned through the compiled path — a grep finds no 3.5, "6", "x", __bor, __bxor, __shl, __shr, or __bnot in the file. The compiled path bridges all of these to Executor.dispatcher_bitwise/dispatcher_bnot, and the behavior is correct today, but a future divergence in error attribution (hint suffix), string coercion, or any of the five untested metamethod bridges would go undetected here. Since the whole point of dispatcher_test.exs (per its moduledoc) is to pin the compiled contract against the interpreter, the highest-risk semantics from the RISK statement are the ones left uncovered.

Suggestion: Add compiled-path cases asserting the body raises on 3.5 & 1 and "x" & 1, coerces "6" & 3 == 2, and exercises at least one shift and the bnot metamethod (e.g. __shl, __bnot) so every new Executor bridge clause has a regression test.

…slow-path edges Add register_positions/1 cases for opcodes 53-59 (bitwise + shift + bnot + set_list_multi) to the max_registers invariant test, plus corpus programs that compile a bitwise op and all three set_list_multi tails (call, vararg, bare-vararg init_count==0). This removes the dead-accessor condition for the six bitwise op_* functions, which are consumed only by this invariant test, and refreshes the moduledoc now that the dispatcher carries a +16 slack buffer. In dispatcher_test, assert set_list_multi reached bytecode via a prototype-tree scan (encodes_op?/2) rather than first_sub/1, which only saw the sibling pair() prototype, and add the vararg-fed producers. Pin the bitwise slow-path edges named in the change's RISK statement through the compiled path: numeric-string coercion, the fractional-float and non-numeric-string raises, and the __shl/__bnot metamethod bridges.

davydog187 · 2026-06-11T11:07:20Z

🤖 Ultracode review — round 2

All findings below are minor — test-coverage gaps on the PR's stated risk surface (compiled-path dispatcher tests). No blockers.

[MINOR] test/lua/vm/dispatcher_test.exs:bitwise opcodes describe block — Metamethod-bridge coverage is asymmetric: __bor, __bxor, __shr untested on the compiled path
- Each dispatcher_bitwise/7 clause in executor.ex hardcodes a distinct metamethod-name string ("__band", "__bor", "__bxor", "__shl", "__shr", "__bnot"). The new compiled-path tests drive only three of the six bridges via a table operand: __band, __shl, and __bnot. A transposed or wrong metamethod string in the untested __bor / __bxor / __shr clauses would pass CI silently, because the interpreter's bitwise clauses are separate code and top-level Lua.eval! bitwise tests run on the interpreter, not the dispatcher. Verified against gh pr diff 347: only __band, __shl, __bnot metamethod tests are present.
- Suggestion: Add three compiled-path metamethod tests mirroring the __band one — a __bor, a __bxor, and a __shr table operand, each asserting bytecode is present and the metamethod return value, so every bridge's hardcoded metamethod name is pinned.
[MINOR] test/lua/vm/dispatcher_test.exs:bitwise opcodes describe block (shift tests) — No compiled-path test for the negative-value logical (unsigned) right shift
- The PR's stated risk is Lua 5.3 bitwise semantics, and the trickiest one is that >> is a logical (unsigned) shift: -1 >> 1 == 0x7FFFFFFFFFFFFFFF, not -1. executor.ex lua_shift_right (line 3056) masks with 0xFFFFFFFFFFFFFFFF for exactly this. The added compiled-path shift tests use only non-negative operand values (256>>4, 1<<4, negative shift COUNT, shift>=64); none uses a negative operand VALUE, so the wiring that the unsigned-shift result survives the dispatcher->Executor bridge is unpinned on the compiled path.
- Suggestion: Add function f(a,b) return a >> b end return f(-1, 1) asserting the sub-prototype's bytecode is present and results == [9223372036854775807].
[MINOR] test/lua/vm/dispatcher_test.exs:set_list multi-return tail describe block — No compiled-path test for a set_list multi-return tail whose call returns zero values
- The three set_list_multi tests cover a multi-return call (>=1 value), a vararg tail, and a bare-vararg tail (init_count==0, total 0+0). None covers init_count > 0 with a trailing call that returns nothing, e.g. local t = {1, none()}, which exercises total = init_count + 0 with a runtime multi_return_count of 0 after a real call. This is a distinct fold case from the static bare-vararg path. Lower-risk than the findings above since the fold is plain integer addition, but it is a genuinely uncovered path on the PR's risk surface.
- Suggestion: Add a test: function none() end function build() local t = {1, none()}; return #t, t[1] end return build() asserting the set_list_multi opcode is encoded and results == [1, 1].

…eturn set_list tail Cover the __bor, __bxor, and __shr metamethod bridges on the compiled path so each clause's hardcoded metamethod name is pinned, add a negative-operand logical (unsigned) >> test, and a set_list multi-return tail whose call returns zero values.

davydog187 · 2026-06-11T11:16:29Z

🤖 Ultracode review — round 3

[MAJOR] lib/lua/vm/dispatcher.ex:446-483 — band/bor/bxor integer fast path skips int64 narrowing, diverging from the interpreter for out-of-range integer operands

Confirmed end-to-end on the PR worktree. The three is_integer(va) and is_integer(vb) fast paths for @op_bitwise_and/or/xor store Bitwise.band/bor/bxor(va, vb) raw, but the interpreter clauses (executor.ex:1517/1533/1549) always wrap in Numeric.to_signed_int64. A register can hold a non-int64 integer: Lua.VM.Value.encode/2 (value.ex:184, is_number clause) passes host integers through unnarrowed, and Lua.set! does not narrow either. Measured: with big = 2^64 injected via Lua.set!, compiled g(x) = x | 1 returns [18446744073709551617], while the interpreter (forced via a goto deopt) returns [1]. So the compiled path (a) silently produces wrong arithmetic versus the interpreted fallback for identical source/input, and (b) leaks a value outside [-2^63, 2^63-1] into Lua state, breaking the 64-bit-integer invariant. Shifts and bnot are unaffected because they bridge to dispatcher_bitwise/dispatcher_bnot, which narrow. Reachability is narrow (requires a host to inject an out-of-int64 integer; pure Lua arithmetic stays narrow), but the result is a silent correctness divergence plus an invariant violation, hence major.

Suggestion: Wrap the three fast-path results in Numeric.to_signed_int64/1, e.g. Numeric.to_signed_int64(Bitwise.band(va, vb)), matching executor.ex:1517/1533/1549 (a cheap range check + conditional mask). Also fix the comment at dispatcher.ex ~436-444: the no-overflow claim only holds for already-narrow registers, which are not guaranteed. Add a regression test feeding an out-of-int64 integer through Lua.set! into a compiled &/|/~ and asserting equality with the interpreted result — the current suite only exercises in-range integers, so this gap is invisible in CI.

The compiled band/bor/bxor fast path stored the raw Bitwise result without the to_signed_int64 narrow the interpreter applies. A register can hold an integer outside [-2^63, 2^63-1] (a host int injected via Lua.set! / Value.encode/2, neither of which narrows), so the fast path diverged from the interpreter and leaked an out-of-range integer back into Lua state. Wrap the three results in Numeric.to_signed_int64 to match executor.ex, and add a regression test feeding an out-of-int64 operand through a compiled &/|/~.

davydog187 · 2026-06-11T11:21:34Z

🤖 Merge assessment — performance justification

Verdict: Merge. The "no perf improvement" concern is measuring the wrong A/B — the real win is a removed dispatch cliff, and it's large (~1.5x) on the path that changed.

Measured impact

I independently re-ran the comparison on a clean worktree of perf/dispatcher-coverage @ c8aed01 (branch tip; two test commits past the 968296c the PR body benchmarked). Apple M4, Elixir 1.20 / OTP 29, MIX_ENV=benchmark LUA_BENCH_MODE=full (2s warmup / 10s time / 1s memory_time).

The whole-program eval! A/B (main vs branch) genuinely shows ~no delta — and that is expected, not a failure of the PR. On main, the bytecode encoder bails the entire enclosing prototype to :fallback/interpreter on the first uncovered opcode (bytecode.ex: "the whole list bails out as :fallback"). So a function containing a single bitwise op never compiled at all before this PR. The benchmark in the PR body that compared eval! end-to-end was comparing interpreter-vs-interpreter — pure jitter (the author flagged this honestly).

The change's actual effect is compiled dispatcher path vs interpreter for the same bitwise code. Using the repo's bytecode-strip mechanism (dispatcher_vs_interpreter.exs, adapted to a band/bor/bxor/shl/shr hot loop; closure tags verified :compiled_closure vs :lua_closure, results identical = 896):

`bitloop(200k)`	dispatcher (compiled)	interpreter (stripped)	delta
time (avg)	60.87 ms (±2.53%)	92.74 ms (±2.02%)	1.52x faster (−31.9 ms)
memory	791.7 MB	929.5 MB	1.17x less (−138 MB)

This confirms (and slightly exceeds) the PR body's corrected "1.35x / 1.16x" headline — my loop is bitwise-heavier per iteration, so a larger delta is expected. The deltas are an order of magnitude larger than the ±2-3% run deviation, so this is a real, repeatable speedup, not noise.

Cost

650/48 line diff, but most of it is mechanical and low-risk: 6 near-identical bitwise dispatch clauses, 6 executor bridge fns, 7 opcode constants, and ~300 lines of tests. Genuine risk surfaces:

@op_* 53-59 hand-duplicated in bytecode.ex and dispatcher.ex. Drift risk, guarded by tests; note the dispatcher copies the integers rather than calling the Bytecode.op_* accessors the PR added — latent inconsistency, worth a follow-up but not a blocker.
band/bor/bxor fast path skips to_signed_int64. Sound (bitwise combo of two in-range int64s stays in range) and tested at int64 boundaries + 0xFFFF… masking.
Re-introduced @reg_slack 16 buffer + grow_regs/pad_nils. This reverts a prior "no buffer needed" claim in init_regs/init_callee_regs. It's required for the set_list multi-return tail ({x, f()}), which writes registers past codegen's reported peak — the same invariant the interpreter already relies on. Correctness-load-bearing, small constant per-frame cost.

Targeted tests all green on the branch tip: dispatcher_test + bytecode_test + max_registers_invariant_test (119 passed), bitwise_test (12 passed).

Reasoning

A perf PR earns merge when it removes a dispatch cliff at acceptable risk — this does exactly that. Pre-PR, any function touching a bitwise op or a multi-return constructor tail ran fully interpreted; post-PR it compiles and runs ~1.5x faster with less memory. The "not much improvement" read comes only from the end-to-end eval! A/B, which can't see the win because the rest of the program already compiled on both refs. The complexity added is mostly boilerplate, the two real correctness subtleties (int64 fast-path skip, reg slack) are sound and tested, and the suite is green. Merge it. Optional follow-up: have the dispatcher reference Bytecode.op_* accessors instead of re-declaring the constants, to kill the duplication-drift risk.

Method caveats: single-machine (M4) Benchee run; the strip-bytecode A/B isolates the dispatcher vs interpreter on identical VM state, which is the right unit for this change but is a micro-benchmark — real-world gain depends on how hot the bitwise/multi-return code is in a given workload.

davydog187 · 2026-06-11T11:46:15Z

🤖 Ultracode review — round 1

[MINOR] lib/lua/vm/executor.ex:1283-1289 — Interpreter :vararg count==0 path can crash on a constructor tail beyond the +16 slack, diverging from the now-correct dispatcher
- This PR adds grow_regs to the dispatcher's multi-return/vararg expansion plus a +16 @reg_slack buffer, so the COMPILED path now handles local t = {0, ...} where the vararg tail exceeds the slack. But the INTERPRETER :vararg count==0 clause still writes with a bare put_elem and no ensure_regs_capacity (executor.ex:1285), whereas every call-site multi-return path (executor.ex:2115/2137/2144/2165/2182/2188) does grow. So for a {0, ...} tail with more than ~16 varargs the interpreter raises :badarg from put_elem while the dispatcher succeeds. That contradicts the PR's stated parity bar ('compare compiled path output against the interpreted fallback for identical results'). The new max_registers_invariant cases only use small varargs, so nothing pins the boundary. The interpreter fix is out of scope, but the divergence should be acknowledged and the dispatcher grow path pinned.
- Suggestion: Add a compiled-path regression test for a vararg constructor tail larger than 16 values (e.g. {0, ...} invoked with >16 args) so the dispatcher's grow_regs path is pinned at the slack boundary, and either file a follow-up to give the interpreter's :vararg count==0 clause the same ensure_regs_capacity growth or note in the PR that the interpreter has a known pre-existing crash on this shape so the divergence is intentional.
[NIT] lib/lua/compiler/bytecode.ex:70 — Retained B5c-v2 plan-id label on a comment block this PR rewrites
- CLAUDE.md repo conventions forbid plan ids in source comments. This PR rewrites the comment body directly beneath line 70 (lines 71-75, removing the old 'bitwise family out of scope' phrasing) but leaves the # B5c-v2: closures, upvalues, ... heading on line 70 in place. Since the PR is actively editing this exact block, the label heading it should be cleaned. (Other pre-existing B5b/B5c occurrences the PR does not touch can be left for a sweep.)
- Suggestion: Drop the B5c-v2: prefix on bytecode.ex:70 in favor of describing the behavior, e.g. # Closures, upvalues, varargs, multi-return, loops, self, concat.

davydog187 · 2026-06-11T11:58:55Z

🤖 Ultracode review — round 2

[MINOR] test/lua/vm/dispatcher_test.exs:bitwise opcodes describe block (shift tests) — Shift family (shl/shr) has no compiled-path coverage for non-integer operands; add minimal coercion + raise tests

The compiled-path bitwise corpus the PR adds is solid for band (float-with-int-value coercion f(6.0,3.0), numeric string f("6",3), fractional-float raise f(3.5,1), non-numeric-string raise f("x",1), int64-boundary, out-of-int64 narrowing). But every shift test (shl/shr) uses integer literals only. shl/shr route through a separate dispatcher clause that always bridges to Executor.dispatcher_bitwise(:shl/:shr,...), so band's coercion/raise tests do not exercise the shift bridge at all. The string-coercion test also only uses a plain integer string ("6"), never the hex-float-string shape ("0xFD.0") the interpreter corpus pins in bitwise_test.exs:48 — which exercises the float_to_integer! branch of to_integer!. Risk is bounded because both paths call the same shared to_integer!/lua_shift_* helpers (no duplicated logic to drift), so this is coverage hygiene, not a latent bug.

Suggestion: Add two small tests to the bitwise describe block: (1) a shift coercion+raise pair, e.g. f(256.0, 2) coerces and f(3.5, 1) via << raises ~r/no integer representation/, pinning the shift bridge the way band already is; (2) a hex-float-string case f("0xFD.0", 0xFF) asserting [253], mirroring the interpreter assertion in bitwise_test.exs so the two paths are demonstrably aligned on that coercion shape.

davydog187 · 2026-06-11T12:09:56Z

🤖 Ultracode review — round 3

Two minor test-coverage gaps, both on stated-risk axes. No blockers.

[MINOR] test/lua/vm/dispatcher_test.exs:set_list multi-return tail describe block — No regression test for a multi-return constructor tail large enough to exceed the +16 register slack

This PR introduces the dispatcher's @reg_slack = 16 buffer (dispatcher.ex:163, init_regs/init_callee_regs) and the grow_regs/2 calls at the multi-return return sites (dispatcher.ex:1533/1545/1593). The set_list_multi opcode reads regs[start..total] and depends on the preceding multi-return/vararg expansion having grown the tuple to hold every trailing value. But all four new set_list multi-return tests use at most 3 trailing values ({1, pair()} with pair returning 2; {1, ...}/{...} with 2-3 varargs; {1, none()} with 0), all comfortably inside the 16-slot slack. So grow_regs's growth branch is never actually exercised on the set_list path — the slack absorbs everything. A future off-by-one in slack sizing, or a regression that drops the grow call, would slip through. The reviewer manually confirmed {0, many()} with 20 returns yields #t == 21, which is exactly the untested case.

Suggestion: Add a dispatcher test where a compiled build() does local t = {0, f()} with f returning >16 values (e.g. 20), asserting #t and a couple of high indices. This pins the grow_regs/@reg_slack interaction the PR introduces, beyond what the slack can mask.
[MINOR] test/lua/vm/dispatcher_test.exs:bitwise opcodes describe block — Compiled-path shift edge cases are asymmetric: >> with a negative count and >> with count >= 64 are untested

lua_shift_right (executor.ex:2969-2976) is a distinct function from lua_shift_left — it carries the unsigned-mask step (band with 0xFFFFFFFFFFFFFFFF) and its own >=64 / <0 guards. The new compiled-path tests cover << direction-flip (a << -4) and << >=64 (1 << 64 == 0), plus one logical >> on a negative operand (-1 >> 1), but never >> with a negative count (which routes into lua_shift_left) nor >> with count >= 64. These are exactly the "correct shift behavior for large/negative counts" cases in the PR risk statement. The underlying functions are exercised by the interpreter-path Lua 5.3 suite (bitwise.lua), which lowers severity, but the compiled-path test set was deliberately given << boundary coverage and the symmetric >> cases were skipped — a one-line gap on a stated-risk axis.

Suggestion: Add two compiled-path tests: f(a, b) return a >> b end with b negative (asserting it equals the left-shift) and with b >= 64 (asserting 0), mirroring the existing << boundary tests.

davydog187 marked this pull request as ready for review June 10, 2026 21:57

davydog187 merged commit df79f8f into main Jun 11, 2026
5 checks passed

davydog187 deleted the perf/dispatcher-coverage branch June 11, 2026 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(vm): compile bitwise ops and set_list multi-return tails#347

perf(vm): compile bitwise ops and set_list multi-return tails#347
davydog187 merged 4 commits into
mainfrom
perf/dispatcher-coverage

davydog187 commented Jun 10, 2026 •

edited

Loading

Uh oh!

davydog187 commented Jun 11, 2026

Uh oh!

davydog187 commented Jun 11, 2026

Uh oh!

davydog187 commented Jun 11, 2026

Uh oh!

davydog187 commented Jun 11, 2026

Uh oh!

Uh oh!

davydog187 commented Jun 11, 2026

Uh oh!

davydog187 commented Jun 11, 2026

Uh oh!

davydog187 commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davydog187 commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed and why

Benchmark delta

Review verdict

Merge gating

Stabilization + independent verification

Corrected headline metric (replaces the inconclusive top-level A/B)

Scope clarification

Correctness checks

Test status

Uh oh!

davydog187 commented Jun 11, 2026

🤖 Ultracode review — round 1

Uh oh!

davydog187 commented Jun 11, 2026

🤖 Ultracode review — round 2

Uh oh!

davydog187 commented Jun 11, 2026

🤖 Ultracode review — round 3

Uh oh!

davydog187 commented Jun 11, 2026

🤖 Merge assessment — performance justification

Measured impact

Cost

Reasoning

Uh oh!

Uh oh!

davydog187 commented Jun 11, 2026

🤖 Ultracode review — round 1

Uh oh!

davydog187 commented Jun 11, 2026

🤖 Ultracode review — round 2

Uh oh!

davydog187 commented Jun 11, 2026

🤖 Ultracode review — round 3

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

davydog187 commented Jun 10, 2026 •

edited

Loading