Skip to content

Improve render perf#21221

Draft
NullVoxPopuli-ai-agent wants to merge 7 commits intoemberjs:mainfrom
NullVoxPopuli-ai-agent:nvp/perf
Draft

Improve render perf#21221
NullVoxPopuli-ai-agent wants to merge 7 commits intoemberjs:mainfrom
NullVoxPopuli-ai-agent:nvp/perf

Conversation

@NullVoxPopuli-ai-agent
Copy link
Copy Markdown
Contributor

@NullVoxPopuli-ai-agent NullVoxPopuli-ai-agent commented Mar 16, 2026

note from nvp:

  • initally this is a great example of ai overfitting to a specific problem

@NullVoxPopuli
Copy link
Copy Markdown
Contributor

NullVoxPopuli commented Mar 16, 2026

image

artifact-1.pdf

    Benchmark Results Summary    

duration phase estimated improvement -2913ms [-2947ms to -2874ms] OR -75.07% [-75.95% to -74.06%]
renderEnd phase no difference [-4ms to 2ms]
render1000Items1End phase estimated improvement -122ms [-133ms to -111ms] OR -81.96% [-89.75% to -74.52%]
clearItems1End phase no difference [-1ms to 11ms]
render1000Items2End phase estimated improvement -67ms [-83ms to -65ms] OR -57.05% [-71.21% to -56.15%]
clearItems2End phase estimated regression +17ms [17ms to 18ms] OR +12.77% [12.44% to 13.31%]
render5000Items1End phase estimated improvement -915ms [-927ms to -893ms] OR -89.54% [-90.79% to -87.46%]
clearManyItems1End phase estimated improvement -170ms [-181ms to -164ms] OR -62.76% [-66.75% to -60.65%]
render5000Items2End phase estimated improvement -828ms [-844ms to -811ms] OR -88.68% [-90.42% to -86.86%]
clearManyItems2End phase estimated improvement -149ms [-149ms to -149ms] OR -99.5% [-99.71% to -99.28%]
render1000Items3End phase estimated improvement -49ms [-50ms to -49ms] OR -49.48% [-49.62% to -49.34%]
append1000Items1End phase estimated improvement -84ms [-85ms to -68ms] OR -62.73% [-63.78% to -51.02%]
append1000Items2End phase estimated improvement -83ms [-100ms to -83ms] OR -62.56% [-74.77% to -62.16%]
updateEvery10thItem1End phase estimated improvement -83ms [-83ms to -66ms] OR -62.01% [-62.18% to -49.8%]
updateEvery10thItem2End phase estimated improvement -83ms [-83ms to -83ms] OR -62.14% [-62.27% to -62.01%]
selectFirstRow1End phase estimated improvement -33ms [-33ms to -33ms] OR -98.82% [-99.08% to -98.55%]
selectSecondRow1End phase estimated improvement -50ms [-50ms to -50ms] OR -99.67% [-99.83% to -99.39%]
removeFirstRow1End phase estimated improvement -83ms [-83ms to -83ms] OR -99.84% [-99.97% to -99.74%]
removeSecondRow1End phase estimated improvement -83ms [-83ms to -83ms] OR -99.88% [-99.99% to -99.84%]
swapRows1End phase estimated improvement -16ms [-16ms to -16ms] OR -24.23% [-24.47% to -24.06%]
swapRows2End phase estimated improvement -16ms [-16ms to -16ms] OR -24.19% [-24.49% to -23.73%]
clearItems4End phase no difference [-24ms to 0ms]


if (Array.isArray(iterable)) {
return new ArrayIterator(iterable, keyFor);
// When iterating a TrackedArray (Proxy), each index access goes through
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well this is dubious lol

#collection = createUpdatableTag();

#storages = new Map<number, ReturnType<typeof createUpdatableTag>>();
// Use a flat array instead of Map for index-based storage lookups.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also sus

NullVoxPopuli and others added 5 commits March 17, 2026 15:38
Correctness fixes:
- validators.ts: Restore try/finally for isUpdating recovery, use > with
  NaN fallback for volatile tag propagation
- reference.ts: Use track() for error-safe tracking frames (try/catch in
  valueForRef kills V8 JIT optimization)
- iterable.ts: Don't reuse iterator result objects (callers store refs)
- tracking.ts: Avoid corrupting combinator tags when pooling trackers

Performance optimizations:
- reference.ts: Fast path for iterator item refs — skip tracking frame
  and closures entirely since compute only consumes one tag. Eliminates
  ~40,000 allocations per 5000-item list render.
- destroyable: Swap-and-pop removal instead of indexOf+splice, reducing
  array shifting from O(n) to O(1) per removal.
- element-builder.ts: Eliminate First/Last wrapper class allocations,
  store raw SimpleNode references with boolean discriminator.
- tracking.ts: Hand off tags array to combine() instead of slice() copy.
  Remove unwrap() from hot endTrackFrame path.
- update.ts: Remove bulk-clear code that was slower than single-pass
  (3 passes over data vs 1).
- stack.ts: Fast path for capture(0).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Bug fixes:
- validators.ts: Move isUpdating check BEFORE lastChecked early return.
  During cycle detection, lastChecked is already set to $REVISION, so the
  early return was firing before the cycle could be detected.
- iterable.ts: Handle empty arrays in ArrayIterator.next() — return null
  instead of producing a spurious item with undefined value.
- reference.ts: Use track() for error-safe tracking (try/catch in
  valueForRef or separate function both kill V8 JIT optimization).

Performance optimizations:
- reference.ts: Fast path for iterator item refs — skip tracking frame
  and closures entirely (~40,000 fewer allocations per 5000-item render).
- destroyable: Swap-and-pop removal instead of indexOf+splice.
- element-builder.ts: Eliminate First/Last wrapper allocations.
- tracking.ts: Hand off tags array instead of slice(); remove unwrap()
  from hot endTrackFrame path.
- update.ts: Remove slower bulk-clear code.
- stack.ts: Fast path for capture(0).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@NullVoxPopuli NullVoxPopuli self-assigned this Mar 23, 2026
johanrd added a commit to johanrd/ember.js that referenced this pull request Apr 19, 2026
Skips the unused return-value read from pop(). Semantically identical
(both truncate the array by one; the packed-array fast path is
preserved in either form). Also byte-identical to the same change in
NVP's upstream emberjs#21221, simplifying side-by-side review.
@johanrd
Copy link
Copy Markdown
Contributor

johanrd commented Apr 19, 2026

FWIW: I made claude extract each change in this PR onto its own branch and ran them through TracerBench 20-fidelity compare (Krausest-style smoke-tests/benchmark-app, M1 Max / Node 24.14 / Chrome Canary headless, origin/main vs each branch). I also ran the full test matrix on each via my fork's CI.

Hi @NullVoxPopuli-ai-agent — I spent some time extracting each change in this PR onto its own branch at johanrd/ember.js, running TracerBench (mostly 20-fid, some at 40-fid and 80-fid), and writing isolated mitata microbenches where the mechanism was testable in isolation. Each exploration PR carries both the isolated and end-to-end numbers so the contributions can be weighed independently.

Hopefully this helps triage which pieces carry the reported perf win and which are free to drop to simplify review.

Environment: MacBook Pro 14" (M1 Max), Node 24.14, Chrome Canary headless, Ember source built via bin/benchmark.mjs (builds tarballs from both sides). Isolated microbenches via mitata.

Per-change summary

row change in this PR verdict Krausest duration microbench exploration PR
1 destroyable remove() swap-and-pop real win −108 ms / −4.6 % (20-fid) n/a #21323 merged into main
2 ArrayIterator flat state (tagged-union → started: boolean) DON'T MERGE — correctness bug would look +7.8 % but fake n/a johanrd/ember.js#24 closed
3 .slice() before iterating null no diff (2× 20-fid) n/a johanrd/ember.js#25 closed
4 WeakMapWithPrimitives eager init null no diff (40-fid) (inlining already handles it) johanrd/ember.js#29
5 _iterTag fast path on ReferenceImpl overlapping with row 1 null vs post-destroyable main cold −11 % · warm ~0 johanrd/ember.js#26 closed
6 First/Last wrapper elimination isolated win, Krausest null no diff (20-fid + 80-fid) −71 to −82 % johanrd/ember.js#27
7 EvaluationStack.capture items===0 fast path isolated win, Krausest null no diff (20-fid); tiny +1.4 % on one phase items=0 −89 % / items=3 −10 % johanrd/ember.js#30
8 DocumentFragment batching in ListBlockOpcode.sync unbenched real mechanism, real win, real correctness footgun. any modifier that calls getBoundingClientRect(), offsetTop, closest(), or reads computed styles at mount time gets garbage. getBoundingClientRect() on a detached-fragment element
9 convertToInt charCode early-reject bi-modal trade-off null on Krausest pure-idx +33 % · method-heavy −78 % · pure-methods −84 % johanrd/ember.js#33
10 TrackedArray #storages Map → Array microbench wins, Krausest regresses +1.20 % / +25 ms (80-fid) seq −81 to −90 % · sparse −86 % johanrd/ember.js#28
11 Tracker pooling don't merge regressed ~50 ms previously (tested previously)
12 Tracker Set + Array dual structure don't merge Set→Array regressed +20 % / +37 % on render5000 (tested previously)
13 MonomorphicTagImpl[COMPUTE] Math.max → manual compare conflicting signals −1.41 % / −30 ms (80-fid) no regressions +10 to +24 % regression in isolation johanrd/ember.js#31
14 MonomorphicTagImpl[COMPUTE] early-returns refactor (pure) null refactor no diff (20-fid) n/a johanrd/ember.js#32

Findings worth calling out

Row 1 — destroyable (merged)

The one cleanly-attributable win. O(n²) → O(n) on child-teardown during a 5000-row clear. clearManyItems1End −21 %, clearManyItems2End −40 %. Reproduced multiple times.

Row 2 — ArrayIterator flat state: silent correctness bug

Initially looked like a −172 ms / −7.81 % win. pnpm test surfaced {{each}} works when updating old items failing — the constructor-time iterator[0] read is load-bearing: on a TrackedArray it fires the Proxy get trap, which consumes the collection tag inside the iterator-ref's track() frame. The flat-state constructor never reads the iterable, so the ref stops invalidating on in-place mutations (arr[i] = v, splice, length = 0, push), ListBlockOpcode.evaluate's lastIterator !== iterator check short-circuits, sync() never runs, DOM stays stale. The bench harness has no DOM-state assertion — it measures performance.mark → click → requestIdleCallback → mark, which records "empty main thread idles immediately" as a huge perf win. Details in johanrd/ember.js#24.

Fix: read iterator.length in the constructor (Proxy side-effect consumes collection tag). But once fixed, the real delta is nil.

Row 5 — _iterTag: absorbed by destroyable

Solo bench showed −55 ms / −2.47 %. Combined with destroyable in one branch: −75 ms total, not the additive −108 + −55 = −163 ms. The clearManyItems2End piece of the solo "win" was indirect GC / JIT state that already belonged to destroyable's territory. Rebenched against post-destroyable main: null.

Rows 6, 7 — real isolated wins that Krausest can't see

Microbenches show −71 % to −89 % in isolation; Krausest says null. The mechanism is real (wrapper allocations / stack-capture branch cost) but the per-render saving is < 1 % of Krausest's total, below its noise floor. Could matter for long-running apps that accumulate these over thousands of renders; hard to prove on any single-run bench.

Row 10 — microbench wins turn into Krausest regression

Opposite lesson: microbench says Array-backed #storages is 81–90 % faster than Map. Krausest 80-fid says +1.20 % duration regression, distributed across several render phases, no compensating wins. The microbench's simple object-in-array doesn't reproduce the real TrackedArray context — private class fields, #storages.length = 0 clear pattern triggering ElementsKind transitions, Proxy trap overhead dominating the per-access cost. Microbench speedups don't guarantee production speedups.

Row 9 — bi-modal

Microbench: +33 % regression on Krausest-like pure-iteration, −78 % to −84 % win on method-heavy patterns (computed properties derived from trackedArrays). Krausest null because the mix roughly cancels. Whether to ship depends on what real apps look like — I don't have that data.

Row 13 — Math.max: Krausest positive, microbench negative

Isolated microbench: manual compare is 10–24 % slower than Math.max across N=4–256. V8's Math.max intrinsic already inlines optimally for short arrays.

80-fid Krausest: −1.41 % duration, tight CI, no regressions, wins distributed across render / clear / select phases. 20-fid and 40-fid earlier showed small +1–2 ms regressions that dissolved at 80-fid.

I cannot cleanly explain the Krausest win if the mechanism itself is slower in isolation. Possible: smaller function body fits icache / inlines better, JIT tier-up timing at call sites, or the Krausest-80fid "win" is more subtle than the noise floor suggests. Flagging for your take.

Row 14 — early-returns: pure null

All 23 Krausest phases no-diff. Refactor-only. Could be accepted on readability grounds.

Suggested action

What the bench supports landing: the destroyable change (row 1), already shipped as johanrd/ember.js#23 → merged into johanrd/main.

Everything else is either:

  • null at Krausest scale (rows 3, 4, 6, 7, 14),
  • silent correctness break (row 2),
  • absorbed by row 1 (row 5),
  • previously-shown regression (rows 11, 12),
  • bi-modal trade-off (row 9),
  • signals-conflict (row 13),
  • actively regressing on Krausest despite isolated win (row 10),
  • or unbenched with real risk (row 8).

Narrowing this PR to just the destroyable piece would capture the only clear win without risking row 2's correctness bug, and johanrd/ember.js#23 already has the attributed bench data if that's more useful to reference.

Methodology caveat — the Krausest harness doesn't verify DOM state

smoke-tests/benchmark-app/app/run-benchmark.js measures performance.mark(start) → click → requestIdleCallback → performance.mark(end) with no assertion that the DOM matches the expected state. Any change that silently breaks reactivity will register as a large "improvement" because the browser goes idle immediately instead of doing the DOM work. The #2 flat-state case is exactly this.

Rule I'm applying now: never trust a bench win that hasn't been run through pnpm test. If the test matrix can't see the change's effect, the matrix needs extending before the bench number is trustworthy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants