perf: speed up exact diff common cases by bernaferrari · Pull Request #24 · knaeckeKami/diffutil.dart

bernaferrari · 2025-12-22T22:03:12Z

Summary

This replaces the earlier broad optimization experiment with a smaller exact-Myers optimization pass aimed at common diff workloads and reviewability.

Changes:

trims common suffixes before running Myers, while intentionally not prefix-trimming so duplicate anchoring stays compatible with issue Removing 2 consecutive elements from a list returns incorrect updates #15
reuses one typed-data backing buffer for forward/backward k-lines and result status arrays
shares the middle-snake range traversal between the delegate and interned paths, while keeping the hot midpoint/snake loops specialized
keeps the normal delegate path direct, so common exact diffs do not pay an indirect comparator/interner cost
only interns list items for larger, non-aligned middle ranges where integer ID comparison can help
skips interning when a custom equalityChecker is supplied, because HashMap equality may not match the caller's equality semantics
adds hash-collision regression coverage for both update APIs
adds an AOT/JIT benchmark harness under tool/bench/bench.dart

The implementation preserves exact diff behavior. There is no heuristic cutoff or intentionally non-minimal mode.

Validation

Local checks on the pushed branch:

dart analyze
dart test
dart compile exe tool/bench/bench.dart -o build/diff_bench_current
./build/diff_bench_current
dart run tool/bench/bench.dart

I also ran the same benchmark harness against master by copying tool/bench/bench.dart into a clean master worktree before compiling, so both binaries used the same measurement code.

Benchmark settings:

detectMoves: false
warmups: 3
samples: 10
target: 20000us
table values are median microseconds per iteration

Speedup is master / branch, so values below 1.00x are regressions. These are local microbenchmarks and the smallest rows are noisy.

AOT Benchmarks

type	size	diffs	master median us	branch median us	speedup
int	10	none	0.48	0.48	1.00x
int	10	few	0.70	0.70	1.00x
int	10	many	2.29	2.51	0.91x
object	10	none	0.72	0.60	1.20x
object	10	few	0.94	0.98	0.96x
object	10	many	2.18	3.13	0.70x
int	100	none	2.40	1.75	1.37x
int	100	few	3.26	3.42	0.95x
int	100	many	109.59	98.41	1.11x
object	100	none	5.10	3.22	1.58x
object	100	few	6.41	3.09	2.07x
object	100	many	123.66	123.54	1.00x
int	1000	none	21.12	13.71	1.54x
int	1000	few	46.70	39.07	1.20x
int	1000	many	9047.00	8572.50	1.06x
object	1000	none	45.20	24.21	1.87x
object	1000	few	101.40	90.44	1.12x
object	1000	many	12735.00	8416.50	1.51x
int	10000	none	207.28	137.78	1.50x
int	10000	few	1097.50	1179.22	0.93x
int	10000	many	935030.00	1141486.00	0.82x
object	10000	none	503.58	252.95	1.99x
object	10000	few	1769.63	1699.56	1.04x
object	10000	many	1869459.00	1132685.00	1.65x

JIT Benchmarks

type	size	diffs	master median us	branch median us	speedup
int	10	none	0.80	0.48	1.67x
int	10	few	1.57	1.02	1.54x
int	10	many	4.22	2.83	1.49x
object	10	none	1.03	0.63	1.63x
object	10	few	1.46	1.15	1.27x
object	10	many	4.42	3.27	1.35x
int	100	none	3.80	2.77	1.37x
int	100	few	5.62	3.17	1.77x
int	100	many	164.26	133.38	1.23x
object	100	none	7.75	2.95	2.63x
object	100	few	9.18	4.34	2.12x
object	100	many	261.44	209.27	1.25x
int	1000	none	39.65	17.87	2.22x
int	1000	few	52.17	67.46	0.77x
int	1000	many	13706.50	12507.00	1.10x
object	1000	none	82.90	33.10	2.50x
object	1000	few	122.88	118.20	1.04x
object	1000	many	20288.00	10895.00	1.86x
int	10000	none	233.22	176.02	1.32x
int	10000	few	895.56	1063.09	0.84x
int	10000	many	1235363.00	899683.00	1.37x
object	10000	none	619.02	341.70	1.81x
object	10000	few	2135.50	2731.13	0.78x
object	10000	many	2077002.00	1249991.00	1.66x

Notes

This does not claim a universal speedup. It materially improves no-change and many object-list workloads, keeps the algorithm exact, and preserves the historical duplicate anchoring covered by issue #15. A few int-heavy few/many rows are neutral to slower; accepting that is preferable to adding an imara-style non-minimal cutoff or changing duplicate matching semantics.

knaeckeKami · 2025-12-27T23:21:13Z

Interning now maps items to integer IDs using only hashCode, so distinct items with the same hash are treated as identical. Dart allows hash collisions, which changes correctness: the diff can return no updates when items actually changed (e.g., CollisionPair(1,2) vs CollisionPair(2,1) both hash to 3 via xor). I pushed failing regression tests on this PR branch to demonstrate the issue. A fix likely needs collision handling (bucket by hash + verify ==) or a guard/option to disable interning when collisions are possible.

knaeckeKami · 2025-12-27T23:30:13Z

Or: add a caller-supplied key (e.g., keyOf/idOf) to calculateListDiff so interning can use stable IDs instead of raw hashes.

bernaferrari · 2025-12-28T00:44:50Z

Very very good catch! Fixed!

knaeckeKami · 2025-12-29T12:43:03Z

I added an AOT benchmark harness (tool/bench/bench.dart) following Dart microbenchmarking guidance (AOT compile, warmups, calibration to a target runtime, fixed inputs; refs: https://mrale.ph/blog/2021/01/21/microbenchmarking-dart-part-1.html and https://mrale.ph/blog/2024/11/27/microbenchmarks-are-experiments.html). I ran it against:

master
the initial PR head (8bd5a66, hash-collision bug)
current head (collision fix)

It reports median us/iter for sizes 10/100/1000/10000, diff patterns none/few/many, for both int lists and object lists (8-field class with standard ==/hashCode).

Summary:

For none/few diffs the PR is materially slower than master (often ~3–12x).
For many diffs the PR is faster (~10–15%).
The collision fix adds ~1–2% overhead vs the buggy interner.

Full tables (median us/iter, AOT):

int

size	diffs	master (us)	bug `8bd5a66` (us)	after (us)
10	none	0.29	0.45	0.96
10	few	0.45	0.59	1.10
10	many	1.07	1.08	1.88
100	none	1.17	6.12	5.89
100	few	1.64	6.68	6.35
100	many	53.55	49.93	55.33
1000	none	9.97	57.38	57.45
1000	few	22.21	69.63	67.61
1000	many	4927.75	4238.63	4314.00
10000	none	98.77	1124.00	1124.19
10000	few	425.30	1168.28	1175.91
10000	many	486125.00	418311.00	419665.00

object

size	diffs	master (us)	bug `8bd5a66` (us)	after (us)
10	none	0.32	0.72	1.39
10	few	0.49	0.87	1.52
10	many	1.34	1.43	2.31
100	none	1.34	8.91	8.54
100	few	1.89	9.30	8.87
100	many	71.59	56.79	62.90
1000	none	11.47	88.04	86.46
1000	few	26.47	99.44	98.09
1000	many	6736.00	4311.38	4378.88
10000	none	117.26	1454.06	1443.38
10000	few	528.05	1476.00	1474.06
10000	many	666148.00	419549.00	423003.00

This seems at odds with the PR description claiming ~20% to 10x speedups. Can you clarify how those measurements were obtained (workload, inputs, tooling, JIT vs AOT, warmups/samples)? I want to align the benchmark methodology so we compare apples-to-apples.

bernaferrari · 2026-05-29T04:15:31Z

Sorry for the long wait. I forgot. I asked codex to rewrite everything and make sure every benchmark had speedups. To also test JIT vs AOT. It should be 1.5~2x faster. Updated here.

For even faster, I didn't push, but it is possible to be inspired by https://docs.rs/imara-diff/latest/imara_diff/ which is the fastest diff out there. It adds a 256 cutoff to meyers which makes it up to 36x faster on large changes (10000 | many), however it doesn't guarantee minimal diffs, so it is not 100% equivalent to meyers. If you think that is interesting I can push for you to see.

knaeckeKami · 2026-06-06T11:46:57Z

Thank you! LGTM.

I noticed that in some cases this can change the edit script when duplicates are involved, e.g.
old=[0], new=[0,0]:
old: Insert(position: 1)
new: Insert(position: 0)

Both are valid, but I'll release this with a major version bump so use cases that depend on the old order don't break.

For even faster, I didn't push, but it is possible to be inspired by https://docs.rs/imara-diff/latest/imara_diff/ which is the fastest diff out there. It adds a 256 cutoff to meyers which makes it up to 36x faster on large changes (10000 | many), however it doesn't guarantee minimal diffs, so it is not 100% equivalent to meyers. If you think that is interesting I can push for you to see.

Yes! But I think we should make this opt-in and document it, not the default behaviour.

bernaferrari · 2026-06-06T13:27:30Z

Yay!! Thanks a lot for your patience. Feel free to make a test for that duplication issue so this is known in the future (if behavior ever changes again accidentally)

knaeckeKami · 2026-06-06T14:38:26Z

I don't know if it makes sense to lock-in particular sequences of operations out of the many possible valid ones - I just think if we release an update that might change them, we should do it with a major version bump, especially after so many years of no updates to be conservative and avoid breaking users that accidentally depend on one particular order.

knaeckeKami self-assigned this Dec 29, 2025

perf: speed up exact diff common cases

041589c

bernaferrari force-pushed the perf/optimizations branch from 6c43d4e to 041589c Compare May 29, 2026 03:47

bernaferrari changed the title ~~perf: optimize diff algorithm with 4x speedup~~ perf: speed up exact diff common cases May 29, 2026

refactor: preserve duplicate anchoring in diff fast path

3317ff1

knaeckeKami merged commit e5e7c97 into knaeckeKami:master Jun 6, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: speed up exact diff common cases#24

perf: speed up exact diff common cases#24
knaeckeKami merged 2 commits into
knaeckeKami:masterfrom
bernaferrari:perf/optimizations

bernaferrari commented Dec 22, 2025 •

edited

Loading

Uh oh!

knaeckeKami commented Dec 27, 2025

Uh oh!

knaeckeKami commented Dec 27, 2025 •

edited

Loading

Uh oh!

bernaferrari commented Dec 28, 2025

Uh oh!

knaeckeKami commented Dec 29, 2025 •

edited

Loading

Uh oh!

bernaferrari commented May 29, 2026 •

edited

Loading

Uh oh!

knaeckeKami commented Jun 6, 2026

Uh oh!

Uh oh!

bernaferrari commented Jun 6, 2026

Uh oh!

knaeckeKami commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bernaferrari commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

AOT Benchmarks

JIT Benchmarks

Notes

Uh oh!

knaeckeKami commented Dec 27, 2025

Uh oh!

knaeckeKami commented Dec 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bernaferrari commented Dec 28, 2025

Uh oh!

knaeckeKami commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bernaferrari commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

knaeckeKami commented Jun 6, 2026

Uh oh!

Uh oh!

bernaferrari commented Jun 6, 2026

Uh oh!

knaeckeKami commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bernaferrari commented Dec 22, 2025 •

edited

Loading

knaeckeKami commented Dec 27, 2025 •

edited

Loading

knaeckeKami commented Dec 29, 2025 •

edited

Loading

bernaferrari commented May 29, 2026 •

edited

Loading