perf(table): cache array border for O(1) #t on no-hole tables#350
Conversation
🤖 Ultracode review — round 1No blockers. Two test-coverage gaps in the border test suite.
|
🤖 Ultracode review — round 2
|
🤖 Ultracode review — round 1
|
🤖 Ultracode review — round 2No actionable findings this round — semantics, cache-coherence, and test coverage all look sound. ✅ |
Add a cached `border` to %Lua.VM.Table{} so the common no-hole `#t`
returns in O(1) instead of scanning 1..arr_n, lifting table.insert from
O(n^2) to O(n). The cache is set only to a value the existing scan would
return (arr_n after a full contiguous absorb) and flipped to :dirty by
any mutation that could invalidate it, preserving Lua 5.3 §6.1 hole-aware
border semantics. Non-integer key writes leave the cache untouched.
Plan: B
The contiguous-append clause in put_array/3 unconditionally cached
border = arr_n after an absorb, on the false premise that a fresh top
slot proves 1..arr_n is dense. delete/2 keeps arr_n as the high-water
mark and leaves a nil hole inside the region, so an append at arr_n + 1
could resurrect a border that skips a lower hole: #t on {nil,2,3,4,5,6}
returned 6 while the scan-based path (and main) returned 0 for the same
contents -- a history-dependent, inconsistent #t.
Only re-establish the integer cache when the run is provably dense: an
integer incoming border already proves 1..arr_n-1 dense (O(1) hot path);
a :dirty incoming border scans array_border once and caches only a
hole-free run. Adds two regression tests.
Plan: B
Adds StreamData properties asserting that #t returns a legal Lua 5.3 §6.1 border after any sequence of put/delete mutations, and that get/2 stays consistent with an independent oracle. Generalizes the append-past-a-hole regression beyond the hand-written cases.
Existing fixtures all built through put/3 via build/seq, so the public bulk and constructor entry points were only exercised indirectly. Add targeted unit tests for from_data over out-of-order and holey maps (forcing the dirty absorb-scan arm), replace_data over a dense table with an interior hole, and a put_many full reorder of a dense table.
…ouched Writes and deletes of keys <= 0 route through the hash without dirtying the cached sequence border (insert_hash and the hash-arm delete gate invalidation on positive_int?, not is_integer). Add regression tests mirroring the string-key analogues, and widen the property generator to draw keys from -2..0 so the length property exercises the branch too.
eb5d3b0 to
fe4fe91
Compare
🤖 Ultracode review — round 1
|
An integer sequence border is only ever set to arr_n (in put_array clause 1), and arr_n only grows on that same append arm, so a cached integer border always equals arr_n. The in-array delete clause only fires for 1 <= key <= arr_n == border, making the keep-border (key > border) branch of dirty_after_array_hole?/2 unreachable: every in-array delete can only shorten the dense run reachable from 1. Drop the helper and set border: :dirty unconditionally in that clause, and rewrite the put_array and delete comments to state the real load-bearing invariant (integer border == arr_n) rather than the weaker '1..(arr_n-1) was dense' claim and the tail-pop fast path that the cache invariant can never trigger. Add a regression pinning that both a tail delete and an interior delete flip the cache to :dirty while length/1 still reports a legal border.
🤖 Ultracode review — round 2
|
🤖 Ultracode review — round 3
|
Add Lua-level #t regressions for the insert loop, mid-sequence table.insert(t,pos,v), table.remove/2 tail-and-position, and table.sort, plus a unit test pinning that absorb_from_hash drops the order memo (and pairs() surfaces no stale entry) when a parked key is promoted out of the hash.
What & why
Caches the array sequence border on Lua tables so
#tis O(1) on tables with no holes, turning the per-iteration#tscan int[#t+1]=vandtable.insert(t,v)append loops from O(n^2) into O(n). The length operator previously rescanned the array part on every read.Benchmark delta
Headline: -98% on the default quick-mode size (n=2000 append loop, average time 21.96 ms -> 0.4449 ms). Quadratic baseline scaling (4x size -> ~16x time) collapses to linear once
#tis O(1).Benchee full mode, average time. Baseline
origin/main @016fa35; branchperf/table-cached-border @b817592(same machine, back-to-back, detached HEAD).Append
t[#t+1]=vtable.insert(t,v)The win grows with n because the fix turns the O(n^2) per-iteration
#tscan into O(1) — already large and clearly outside noise (baseline append deviations 9.6-13.7%, branch 6-43%) at every size.Reproduce
In repo,
MIX_ENV=benchmark mix deps.getonce, then a minimal Benchee scriptbenchmarks/table_border_append.exs(Lua chunk eval offor i=1,n do t[#t+1]=i endandfor i=1,n do table.insert(t,i) end, sizes n in {500,2000,5000}) invoked asMIX_ENV=benchmark LUA_BENCH_MODE=full mix run benchmarks/table_border_append.exs(full opts: time:10s warmup:2s memory_time:1s). The existingbenchmarks/table_ops.exsdoes NOT cover these scenarios (its build usest[i]=i*idirect indexing, never reading#t), so the targeted script was needed. Script was run detached and removed afterward; NOT committed to the branch.Review verdict
An external review (differential test against a brute-force border reference, 60k randomized ops, plus Lua-language-level confirmation) found and fixed one blocker before this PR:
key == arr_n+1clause input_array/3unconditionally cachedborder = absorbed.arr_non the false premise that a freshly-written top slot proves1..arr_nis dense. Sincedelete/2keepsarr_nas a high-water mark and can leave a nil hole inside the array region, an append atarr_n+1resurrected a border that skips a lower hole. Repro:t={10,20,30,40,50}; t[1]=nil; t[6]=60gave#t == 6on the branch but0onorigin/mainand0from the module's own scan-basedlength/1. Both 0 and 6 are valid Lua 5.3 borders for a holey table, so not a hard spec violation, but it was a history-dependent, internally-inconsistent#tand an observable regression versus main that also corruptedtable.insert/ipairsboundaries built on#t.1..arr_nis provably dense. An integer incoming border keeps the O(1) fast path; a:dirtyincoming border scansarray_borderonce and caches only a hole-free run, otherwise stays:dirtysolength/1falls back to the correct scan. Added two regression tests intest/lua/vm/table_border_test.exs(appending past a leading hole, appending past an interior hole).:dirtysentinel, defensive always-true branch, one-time rescan onreplace_data-built tables) — all correctness-preserving and left as-is.Tests pass after the fix; the reviewer's final state is ready.
Merge gate
Draft, human-gated merge — do not merge without human sign-off.
Independent stabilization verification (2026-06-10)
Re-verified on a clean worktree (
work-perf/table-cached-border@bf7b801) against baselineorigin/main @016fa35. The fixes were done by another agent; this pass re-checked them skeptically.Blocker fix confirmed
The append-past-a-hole stale-border blocker (commit
bf7b801) is genuinely resolved. Traced the fixedput_array/3contiguous-append clause: an integer incoming border re-caches in O(1) (the hot insert-loop path); a:dirtyincoming border scansarray_borderonce and only re-caches a hole-free run, otherwise stays:dirtysolength/1uses the correct scan. Verified the exact reprot={10,20,30,40,50}; t[1]=nil; t[6]=60now reports#t == 0(matchingmainand the scan path), and the two pinned regression tests pass. Remaining items are doc/readability nits with no behavior impact.Suite
mix test --exclude lua53: 2246 passed, 7 skipped, 0 failuresmix test --only lua53: 17 passed, 12 skipped, 0 failures (nextvar.luagate OK,sort.luaOK)test/lua/vm/table_border_test.exs: 16 passedVerified numbers (interpreter-routed, n=100000)
Interpreter path forced by recursively setting
proto.bytecode = nil(retags:compiled_closure->:lua_closure, asserted on every benched global).MIX_ENV=benchmark,LUA_BENCH_MODE=full(time:10s, warmup:2s, memory_time:1s), Apple M4, back-to-back.t[#t+1]=vtable.insert(t,v)#tloop (control)Memory (branch, full mode): append 194.61 MB, insert 275.46 MB, holey 245.47 MB — comparable to baseline (184.61 / 265.28 / 235.56 MB). The holey control confirms tables with holes keep
border: :dirtyand run the identical scan path on both branches, so the only change is the no-hole O(1) fast path.Verdict: ready — win holds at orders of magnitude outside noise, suites green, no new regressions. Merge remains human-gated.