Improve render perf by NullVoxPopuli-ai-agent · Pull Request #21221 · emberjs/ember.js

NullVoxPopuli-ai-agent · 2026-03-16T13:47:56Z

note from nvp:

initally this is a great example of ai overfitting to a specific problem

NullVoxPopuli · 2026-03-16T14:12:49Z

    Benchmark Results Summary    

duration phase estimated improvement -2913ms [-2947ms to -2874ms] OR -75.07% [-75.95% to -74.06%]
renderEnd phase no difference [-4ms to 2ms]
render1000Items1End phase estimated improvement -122ms [-133ms to -111ms] OR -81.96% [-89.75% to -74.52%]
clearItems1End phase no difference [-1ms to 11ms]
render1000Items2End phase estimated improvement -67ms [-83ms to -65ms] OR -57.05% [-71.21% to -56.15%]
clearItems2End phase estimated regression +17ms [17ms to 18ms] OR +12.77% [12.44% to 13.31%]
render5000Items1End phase estimated improvement -915ms [-927ms to -893ms] OR -89.54% [-90.79% to -87.46%]
clearManyItems1End phase estimated improvement -170ms [-181ms to -164ms] OR -62.76% [-66.75% to -60.65%]
render5000Items2End phase estimated improvement -828ms [-844ms to -811ms] OR -88.68% [-90.42% to -86.86%]
clearManyItems2End phase estimated improvement -149ms [-149ms to -149ms] OR -99.5% [-99.71% to -99.28%]
render1000Items3End phase estimated improvement -49ms [-50ms to -49ms] OR -49.48% [-49.62% to -49.34%]
append1000Items1End phase estimated improvement -84ms [-85ms to -68ms] OR -62.73% [-63.78% to -51.02%]
append1000Items2End phase estimated improvement -83ms [-100ms to -83ms] OR -62.56% [-74.77% to -62.16%]
updateEvery10thItem1End phase estimated improvement -83ms [-83ms to -66ms] OR -62.01% [-62.18% to -49.8%]
updateEvery10thItem2End phase estimated improvement -83ms [-83ms to -83ms] OR -62.14% [-62.27% to -62.01%]
selectFirstRow1End phase estimated improvement -33ms [-33ms to -33ms] OR -98.82% [-99.08% to -98.55%]
selectSecondRow1End phase estimated improvement -50ms [-50ms to -50ms] OR -99.67% [-99.83% to -99.39%]
removeFirstRow1End phase estimated improvement -83ms [-83ms to -83ms] OR -99.84% [-99.97% to -99.74%]
removeSecondRow1End phase estimated improvement -83ms [-83ms to -83ms] OR -99.88% [-99.99% to -99.84%]
swapRows1End phase estimated improvement -16ms [-16ms to -16ms] OR -24.23% [-24.47% to -24.06%]
swapRows2End phase estimated improvement -16ms [-16ms to -16ms] OR -24.19% [-24.49% to -23.73%]
clearItems4End phase no difference [-24ms to 0ms]

NullVoxPopuli · 2026-03-16T14:15:53Z


    if (Array.isArray(iterable)) {
-      return new ArrayIterator(iterable, keyFor);
+      // When iterating a TrackedArray (Proxy), each index access goes through


well this is dubious lol

NullVoxPopuli · 2026-03-16T14:19:26Z

  #collection = createUpdatableTag();

-  #storages = new Map<number, ReturnType<typeof createUpdatableTag>>();
+  // Use a flat array instead of Map for index-based storage lookups.


Correctness fixes: - validators.ts: Restore try/finally for isUpdating recovery, use > with NaN fallback for volatile tag propagation - reference.ts: Use track() for error-safe tracking frames (try/catch in valueForRef kills V8 JIT optimization) - iterable.ts: Don't reuse iterator result objects (callers store refs) - tracking.ts: Avoid corrupting combinator tags when pooling trackers Performance optimizations: - reference.ts: Fast path for iterator item refs — skip tracking frame and closures entirely since compute only consumes one tag. Eliminates ~40,000 allocations per 5000-item list render. - destroyable: Swap-and-pop removal instead of indexOf+splice, reducing array shifting from O(n) to O(1) per removal. - element-builder.ts: Eliminate First/Last wrapper class allocations, store raw SimpleNode references with boolean discriminator. - tracking.ts: Hand off tags array to combine() instead of slice() copy. Remove unwrap() from hot endTrackFrame path. - update.ts: Remove bulk-clear code that was slower than single-pass (3 passes over data vs 1). - stack.ts: Fast path for capture(0). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bug fixes: - validators.ts: Move isUpdating check BEFORE lastChecked early return. During cycle detection, lastChecked is already set to $REVISION, so the early return was firing before the cycle could be detected. - iterable.ts: Handle empty arrays in ArrayIterator.next() — return null instead of producing a spurious item with undefined value. - reference.ts: Use track() for error-safe tracking (try/catch in valueForRef or separate function both kill V8 JIT optimization). Performance optimizations: - reference.ts: Fast path for iterator item refs — skip tracking frame and closures entirely (~40,000 fewer allocations per 5000-item render). - destroyable: Swap-and-pop removal instead of indexOf+splice. - element-builder.ts: Eliminate First/Last wrapper allocations. - tracking.ts: Hand off tags array instead of slice(); remove unwrap() from hot endTrackFrame path. - update.ts: Remove slower bulk-clear code. - stack.ts: Fast path for capture(0). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Skips the unused return-value read from pop(). Semantically identical (both truncate the array by one; the packed-array fast path is preserved in either form). Also byte-identical to the same change in NVP's upstream emberjs#21221, simplifying side-by-side review.

johanrd · 2026-04-19T20:02:32Z

FWIW: I made claude extract each change in this PR onto its own branch and ran them through TracerBench 20-fidelity compare (Krausest-style smoke-tests/benchmark-app, M1 Max / Node 24.14 / Chrome Canary headless, origin/main vs each branch). I also ran the full test matrix on each via my fork's CI.

Hi @NullVoxPopuli-ai-agent — I spent some time extracting each change in this PR onto its own branch at johanrd/ember.js, running TracerBench (mostly 20-fid, some at 40-fid and 80-fid), and writing isolated mitata microbenches where the mechanism was testable in isolation. Each exploration PR carries both the isolated and end-to-end numbers so the contributions can be weighed independently.

Hopefully this helps triage which pieces carry the reported perf win and which are free to drop to simplify review.

Environment: MacBook Pro 14" (M1 Max), Node 24.14, Chrome Canary headless, Ember source built via bin/benchmark.mjs (builds tarballs from both sides). Isolated microbenches via mitata.

Per-change summary

row	change in this PR	verdict	Krausest duration	microbench	exploration PR
1	destroyable `remove()` swap-and-pop	real win	−108 ms / −4.6 % (20-fid)	n/a	#21323 merged into main
2	`ArrayIterator` flat state (tagged-union → `started: boolean`)	DON'T MERGE — correctness bug	would look +7.8 % but fake	n/a	johanrd/ember.js#24 closed
3	`.slice()` before iterating	null	no diff (2× 20-fid)	n/a	johanrd/ember.js#25 closed
4	`WeakMapWithPrimitives` eager init	null	no diff (40-fid)	(inlining already handles it)	johanrd/ember.js#29
5	`_iterTag` fast path on ReferenceImpl	overlapping with row 1	null vs post-destroyable main	cold −11 % · warm ~0	johanrd/ember.js#26 closed
6	First/Last wrapper elimination	isolated win, Krausest null	no diff (20-fid + 80-fid)	−71 to −82 %	johanrd/ember.js#27
7	`EvaluationStack.capture` items===0 fast path	isolated win, Krausest null	no diff (20-fid); tiny +1.4 % on one phase	items=0 −89 % / items=3 −10 %	johanrd/ember.js#30
8	`DocumentFragment` batching in `ListBlockOpcode.sync`	unbenched real mechanism, real win, real correctness footgun. any modifier that calls getBoundingClientRect(), offsetTop, closest(), or reads computed styles at mount time gets garbage. getBoundingClientRect() on a detached-fragment element	—	—	—
9	`convertToInt` charCode early-reject	bi-modal trade-off	null on Krausest	pure-idx +33 % · method-heavy −78 % · pure-methods −84 %	johanrd/ember.js#33
10	TrackedArray `#storages` Map → Array	microbench wins, Krausest regresses	+1.20 % / +25 ms (80-fid)	seq −81 to −90 % · sparse −86 %	johanrd/ember.js#28
11	`Tracker` pooling	don't merge	regressed ~50 ms previously	—	(tested previously)
12	`Tracker` Set + Array dual structure	don't merge	Set→Array regressed +20 % / +37 % on render5000	—	(tested previously)
13	`MonomorphicTagImpl[COMPUTE]` `Math.max` → manual compare	conflicting signals	−1.41 % / −30 ms (80-fid) no regressions	+10 to +24 % regression in isolation	johanrd/ember.js#31
14	`MonomorphicTagImpl[COMPUTE]` early-returns refactor (pure)	null refactor	no diff (20-fid)	n/a	johanrd/ember.js#32

Findings worth calling out

Row 1 — destroyable (merged)

The one cleanly-attributable win. O(n²) → O(n) on child-teardown during a 5000-row clear. clearManyItems1End −21 %, clearManyItems2End −40 %. Reproduced multiple times.

Row 2 — ArrayIterator flat state: silent correctness bug

Initially looked like a −172 ms / −7.81 % win. pnpm test surfaced {{each}} works when updating old items failing — the constructor-time iterator[0] read is load-bearing: on a TrackedArray it fires the Proxy get trap, which consumes the collection tag inside the iterator-ref's track() frame. The flat-state constructor never reads the iterable, so the ref stops invalidating on in-place mutations (arr[i] = v, splice, length = 0, push), ListBlockOpcode.evaluate's lastIterator !== iterator check short-circuits, sync() never runs, DOM stays stale. The bench harness has no DOM-state assertion — it measures performance.mark → click → requestIdleCallback → mark, which records "empty main thread idles immediately" as a huge perf win. Details in johanrd/ember.js#24.

Fix: read iterator.length in the constructor (Proxy side-effect consumes collection tag). But once fixed, the real delta is nil.

Row 5 — _iterTag: absorbed by destroyable

Solo bench showed −55 ms / −2.47 %. Combined with destroyable in one branch: −75 ms total, not the additive −108 + −55 = −163 ms. The clearManyItems2End piece of the solo "win" was indirect GC / JIT state that already belonged to destroyable's territory. Rebenched against post-destroyable main: null.

Rows 6, 7 — real isolated wins that Krausest can't see

Microbenches show −71 % to −89 % in isolation; Krausest says null. The mechanism is real (wrapper allocations / stack-capture branch cost) but the per-render saving is < 1 % of Krausest's total, below its noise floor. Could matter for long-running apps that accumulate these over thousands of renders; hard to prove on any single-run bench.

Row 10 — microbench wins turn into Krausest regression

Opposite lesson: microbench says Array-backed #storages is 81–90 % faster than Map. Krausest 80-fid says +1.20 % duration regression, distributed across several render phases, no compensating wins. The microbench's simple object-in-array doesn't reproduce the real TrackedArray context — private class fields, #storages.length = 0 clear pattern triggering ElementsKind transitions, Proxy trap overhead dominating the per-access cost. Microbench speedups don't guarantee production speedups.

Row 9 — bi-modal

Microbench: +33 % regression on Krausest-like pure-iteration, −78 % to −84 % win on method-heavy patterns (computed properties derived from trackedArrays). Krausest null because the mix roughly cancels. Whether to ship depends on what real apps look like — I don't have that data.

Row 13 — Math.max: Krausest positive, microbench negative

Isolated microbench: manual compare is 10–24 % slower than Math.max across N=4–256. V8's Math.max intrinsic already inlines optimally for short arrays.

80-fid Krausest: −1.41 % duration, tight CI, no regressions, wins distributed across render / clear / select phases. 20-fid and 40-fid earlier showed small +1–2 ms regressions that dissolved at 80-fid.

I cannot cleanly explain the Krausest win if the mechanism itself is slower in isolation. Possible: smaller function body fits icache / inlines better, JIT tier-up timing at call sites, or the Krausest-80fid "win" is more subtle than the noise floor suggests. Flagging for your take.

Row 14 — early-returns: pure null

All 23 Krausest phases no-diff. Refactor-only. Could be accepted on readability grounds.

Suggested action

What the bench supports landing: the destroyable change (row 1), already shipped as johanrd/ember.js#23 → merged into johanrd/main.

Everything else is either:

null at Krausest scale (rows 3, 4, 6, 7, 14),
silent correctness break (row 2),
absorbed by row 1 (row 5),
previously-shown regression (rows 11, 12),
bi-modal trade-off (row 9),
signals-conflict (row 13),
actively regressing on Krausest despite isolated win (row 10),
or unbenched with real risk (row 8).

Narrowing this PR to just the destroyable piece would capture the only clear win without risking row 2's correctness bug, and johanrd/ember.js#23 already has the attributed bench data if that's more useful to reference.

Methodology caveat — the Krausest harness doesn't verify DOM state

smoke-tests/benchmark-app/app/run-benchmark.js measures performance.mark(start) → click → requestIdleCallback → performance.mark(end) with no assertion that the DOM matches the expected state. Any change that silently breaks reactivity will register as a large "improvement" because the browser goes idle immediately instead of doing the DOM work. The #2 flat-state case is exactly this.

Rule I'm applying now: never trust a bench win that hasn't been run through pnpm test. If the test matrix can't see the change's effect, the matrix needs extending before the bench number is trustworthy.

Improve render perf

3f4490f

NullVoxPopuli reviewed Mar 16, 2026

View reviewed changes

lints

0a290ae

NullVoxPopuli reviewed Mar 16, 2026

View reviewed changes

NullVoxPopuli and others added 5 commits March 17, 2026 15:38

Fixes

7be07d7

hm

75f980a

Fix

a083ed5

NullVoxPopuli self-assigned this Mar 23, 2026

This was referenced Apr 19, 2026

Perf: replace splice with swap-and-pop in destroyable remove() (runtime) johanrd/ember.js#23

Merged

Perf: slice TrackedArray to plain array before iterating (runtime) johanrd/ember.js#25

Closed

johanrd mentioned this pull request Apr 19, 2026

Perf: destroyable remove swap pop (render) #21323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve render perf#21221

Improve render perf#21221
NullVoxPopuli-ai-agent wants to merge 7 commits intoemberjs:mainfrom
NullVoxPopuli-ai-agent:nvp/perf

NullVoxPopuli-ai-agent commented Mar 16, 2026 •

edited by NullVoxPopuli

Loading

Uh oh!

NullVoxPopuli commented Mar 16, 2026 •

edited

Loading

Uh oh!

NullVoxPopuli Mar 16, 2026

Uh oh!

NullVoxPopuli Mar 16, 2026

Uh oh!

johanrd commented Apr 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

NullVoxPopuli-ai-agent commented Mar 16, 2026 • edited by NullVoxPopuli Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NullVoxPopuli commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NullVoxPopuli Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

NullVoxPopuli Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

johanrd commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Per-change summary

Findings worth calling out

Row 1 — destroyable (merged)

Row 2 — ArrayIterator flat state: silent correctness bug

Row 5 — _iterTag: absorbed by destroyable

Rows 6, 7 — real isolated wins that Krausest can't see

Row 10 — microbench wins turn into Krausest regression

Row 9 — bi-modal

Row 13 — Math.max: Krausest positive, microbench negative

Row 14 — early-returns: pure null

Suggested action

Methodology caveat — the Krausest harness doesn't verify DOM state

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NullVoxPopuli-ai-agent commented Mar 16, 2026 •

edited by NullVoxPopuli

Loading

NullVoxPopuli commented Mar 16, 2026 •

edited

Loading

johanrd commented Apr 19, 2026 •

edited

Loading