Skip to content

Fixes, performance improvements and new benchmarks#33

Merged
apirogov merged 63 commits intomainfrom
agent-dev
Apr 22, 2026
Merged

Fixes, performance improvements and new benchmarks#33
apirogov merged 63 commits intomainfrom
agent-dev

Conversation

@apirogov
Copy link
Copy Markdown
Owner

No description provided.

AI Bot added 30 commits April 22, 2026 18:06
PERF-1: is_zero/is_one now compare coeffs directly (no Vec alloc)
PERF-2: zz_add/zz_sub/zz_neg return Self via in-place mutation (no Vec)
PERF-3: scale() works in-place via zz_coeffs_mut (no Vec via zz_mul_scalar)
PERF-4: re()/im() build result via zz_coeffs_mut (no Vec collect)
PERF-5: conj() mutates a copy in place (no Vec collect)
PERF-6: From conversion impls use zz_coeffs_mut (no Vec collect)
PERF-7: GaussInt::pow uses binary exponentiation (O(log n) muls)
PERF-8: zz_inv rationalization loop gets debug_assert for i64 overflow
New stringmatch module with linear-time suffix array construction
using the SA-IS algorithm (Nong, Zhang, Chan 2011). Works on u32
integer sequences with a sentinel (0) as the last element. Includes
18 tests covering edge cases, known examples, and randomized
comparison against naive suffix sort.
LCP: O(n) construction using Kasai's algorithm with rank array.
RMQ: O(n log n) sparse table for O(1) range minimum queries.
19 new tests across both modules, all verified against naive implementations.
MatchIndex: generalized suffix array over i8 integer sequences.
Concatenates strings with unique sentinels, builds SA/LCP/RMQ,
tracks document/position arrays for per-string query support.

Maximal matches: for any pair (i,j), enumerates all maximal common
substrings via SA scan with running LCP minimum. Right-maximality
is automatic (LCP = exact match length). Left-maximality checked
per candidate.

21 new tests including brute-force comparison for random inputs
and edge cases (empty strings, self-matches, negative values).
Domain layer for matching cyclic angle sequences along reverse
complementary subsequences:
- Doubles each string for cyclic wraparound matching
- Indexes both double(s) and double(revcomp(s)) in MatchIndex
- Maps positions back to original cyclic coordinates
- Re-verifies cyclic left-maximality (linear left-max is wrong
  at the doubled string seam and for full-wrap matches)
- Keeps only longest match per cyclic position pair

14 tests including spectre-like angles, cyclic wraparound,
self-matches, and randomized comparison against brute-force.
Phase 7: integrates CyclicMatchIndex with Rat geometric validation to find
all valid glue operations between tiles. Two-phase approach: CMI-seeded
multi-edge matches + single-edge brute force with heuristic pruning.
…alse rejection

Brute-force validator tries ALL (ia,ib) pairs with try_glue and compares
against TileSet::valid_glues for hexagon pairs (36 single-edge, all valid,
same result), hexamino pairs (mixed multi/single-edge), spectre pairs
(multi-edge + mystic), and mixed square+triangle pairs.

Found and fixed bug: junction degeneracy check rejected jl==0 (collinear),
but collinear junctions are valid — edges just merge into a longer edge.
Only |jl|==hturn (180-degree fold-back) is a reliable rejection heuristic.
…r single-edge validation

Empirical analysis of spectre self-match (159 unique intervals, 79 accepted,
80 rejected) showed:
- |jl|==hturn (fold-back) heuristic was WRONG: 4 valid glues at (0,4),(0,8),
  (6,4),(6,8) were incorrectly rejected. All pass try_glue.
- |jl|==hturn heuristic was USELESS: 0 out of 80 rejected cases had |jl|==hturn.
- No junction-angle criterion works: all 79 accepted cases have jl<0 or jr<0,
  so any angle-based rejection would produce false positives.

Single-edge Phase 2 now only uses the covered grid (skip positions already
handled by Phase 1 multi-edge matches) + try_glue for geometric validation.
No untested heuristic code remains.
When two tiles share a single edge, the interior angles at each junction
vertex must not sum to more than a full circle. If they do (exterior angle
sum < 0), the neighbor edges physically overlap, creating a self-intersection.
If they sum to exactly 0, the edges are collinear — a multi-edge extension
already handled by the CMI index. Only strictly positive sums (gap at both
junctions) yield true single-edge candidates.

This replaces the covered-grid approach with a cleaner two-phase design:
Phase 1 seeds multi-edge candidates from the CMI index; Phase 2 enumerates
all position pairs and applies the heuristic to reject locally inconsistent
candidates before the global try_glue check.
…dd heuristic to valid_glues_containing

- Extract get_new_match, build_glue_op, sort_by_interval helpers from
  duplicated try_seed closures in valid_glues and valid_glues_containing
- Replace 3 identical mystic sequence literals with MYSTIC_ZZ12 constant
- Add is_single_edge_candidate heuristic to valid_glues_containing's
  single-edge phase for consistency with valid_glues
…negative offsets, add tests, remove ROADMAP.md
The CMI matches for (i,j) are asymmetric with (j,i) because try_glue
is directional: a.try_glue((ia,ib), b) differs from b.try_glue((ib,ia), a).
Missing the reverse direction caused 18 patches to be missed at size 8.

Also adds CMI diagonal seeding (k in 0..=m.len per match), heuristic
isolation tests, and patch_growth timing output.
…lo bug

The junction gap check generalizes is_single_edge_candidate to arbitrary
match lengths: at each junction vertex, check that the exterior angle sum
is positive (physical gap). This rejects ~25% of candidates before the
expensive Snake self-intersection check.

Also fixes a bug in Rat::get_match where ext_end used self.len() instead
of other.len() for the modulo, producing wrong B-side positions when
tiles had different edge counts. This changed patch counts (e.g. hexagon
size 8: 2260 -> 2243) but all results are now validated correct.

Adds profile_glue binary for per-phase timing analysis.
Seed analysis showed the boundary extension (pa-1, pb+1) produced 0
unique matches across all 81 pairs of size-4 hexagon patches. Removed.

Also reduced CMI diagonal seeding from all interior positions to
endpoints only (k=0 and k=len) — validated correct against brute force
through size 8. Interior positions produced no unique results beyond
what overlapping CMI matches already cover.
Instrumented duplicate detection showed:
- CMI phase has 2,704 internal duplicates (overlapping maximal matches)
- CMI vs single-edge: 2,040 overlaps (single-edge re-finds CMI matches)
- Single-edge phase has 0 internal duplicates

The seen HashSet is needed at the CMI boundary but not within single-edge.
Consolidated try_add_glue into try_glue_if_new, removed dead helpers,
updated doc comments.
k=len seeds were past-the-end positions of CMI matches. Since CMI
matches are maximal (can't extend forward), these positions don't
match, making k=len seeds redundant. Verified by brute force through
size 8: k=0-only produces identical results.

This eliminates all 2,704 CMI-internal duplicates. The seen set is
now only needed for CMI vs single-edge overlap (2,040 duplicates).
Split candidate generation into two disjoint phases:
- Multi-edge (CMI): accepts len >= 2, junction gap >= 0 (reject overlap only)
- Single-edge scan: accepts len == 1 only (gap > 0 rejects overlap + collinear)

By construction, the phases produce disjoint result sets, so no
deduplication HashSet is needed. Single function with a flag controls
strictness. Removed heuristic flag and no-heuristic test helper.
AI Bot added 27 commits April 22, 2026 18:06
…d_rats_for_pairs

Binary decomposition was fundamentally incomplete: it misses patches that
cannot be formed by power-of-2 splits (e.g. T tetromino at size 4).
At size 8 square, binary found 365 vs naive's 693 (47% missed).

Also adds valid_rats_for_pairs using BTreeSet instead of BTreeMap<Vec<GlueOp>>
to avoid OOM on large tile pair counts.
…owth

Binary decomposition missed patches that couldn't be formed by power-of-2
splits (e.g. T tetromino at size 4). Naive (k-1)+1 is universally complete.

Results for square ZZ4 (was vs now):
  size 4:  6 vs 7 (missed T-tetromino)
  size 8:  365 vs 693 (binary missed 47%)
  size 10: 4252 vs 8808 (binary missed 52%)
Replaces layer-by-layer TileSet rebuild with a DFS that grows patches
one tile at a time using monotone birth-counter ordering on boundary
edges, eliminating the need to rebuild the CMI suffix array at each
layer. Deduplicates via canonical Rat at each size level.

Benchmark (hexagon, size 9): 8x speedup over the old TileSet approach
(38s vs 307s), with identical results verified at all sizes.

New files:
- src/intgeom/growing.rs: GrowingPatch type, site enumeration, DFS
- src/bin/patch_bench.rs: benchmark binary comparing both approaches
Replace brute-force O(n*m) matching with TileSet::valid_glues(0,1)
using the existing CMI suffix array infrastructure. Keep GrowingPatch
angles canonical via lex_min_rot, enabling direct position mapping
between TileSet and GrowingPatch coordinate systems.

2x speedup over brute-force Redelmeier (19s vs 38s at hexagon size 9),
16x faster than the original layer-by-layer approach (307s).
Add --ring flag to patch_bench (zz4/zz12). Add ZZ4 square test
verifying Redelmeier matches old TileSet approach up to size 6.
Generic print_results eliminates ZZ12-only type constraint.
Add make_free() to mod out reflections (min(rat, rat.reflected())).
Add test verifying counts match OEIS A000104 (free polyominoes
without holes) up to size 9. Add --free flag to patch_bench.

Validated: 1, 1, 2, 5, 12, 35, 107, 363, 1248 free polyominoes.
…idation

Fixes the position tracing bug (y[0] was double-counted with a_xy junction
fixup) and adds 15 unit tests covering geometry, matching, glue application,
OEIS validation (through size 12), and comparison against general Redelmeier.
…ization

- Replace std::HashSet with rustc-hash::FxHashSet for faster hashing (~10% improvement)
- Add from_canonical_angles_unchecked() to Rat to skip redundant lex_min_rot normalization
- This eliminates double-normalization in the hot path: try_apply already normalizes
  angles, but to_rat() was re-normalizing via prepare_seq()
- Add pprof integration with flamegraph support for profiling
- Total improvement: 25.7% faster (12.66s -> 9.41s at max-size 13)
The Rat is now cached when Patch is created (in from_seed and try_apply).
to_rat() returns a clone of the cached Rat instead of recomputing.
This adds slight complexity but ensures correctness and is marginally faster.
- Replace BTreeSet with FxHashSet for storing patches by size
- This gives O(1) lookup/insert instead of O(log n)
- Add Hash implementation for Rat to support FxHashSet
- Improve size 14 from 36.87s to 31.03s (~16% faster)
- Total improvement from original: ~38% faster (12.66s -> 7.84s at size 13)
- In try_apply, compute min(rat, reflected(rat)) before storing
- This cuts enumerate calls in half (each free polyomino has 2 one-sided forms)
- Size 14: 31.03s -> 27.75s (~10% faster)
- Now storing free polyominoes directly, eliminating need for make_free at end
- Update test to correctly validate OEIS A000105 free polyomino values
- Suppress dead_code for make_free and enumerate_onesideed (kept for API/compatibility)
- Prefix unused variables with underscore
- Add allow(dead_code) attribute to pub(crate) enumerate_sites
- Remove ring selection, now only supports ZZ12
- Default to hexagon tiles, size 7
- Cleaner code overall
- Remove old TileSet approach
- Just run Redelmeier algorithm for benchmarks
- Default to hexagons, size 9
- Add cached Rat to GrowingPatch (avoid recomputing on each to_rat call)
- Use from_canonical_angles_unchecked to skip redundant lex_min_rot
- Both changes significantly speed up general Redelmeier algorithm
- patch_bench hex size 9: 18.8s (was 18.89s before these changes)
Switch result storage from BTreeSet to FxHashSet for O(1) amortized
inserts. Add grow_redelmeier_free() that stores min(r, r.reflected())
during enumeration, halving the search space. Both changes port the
key optimizations from the specialized polyomino benchmark.
The TileSet builds a full CyclicMatchIndex (SA-IS suffix array, LCP,
sparse RMQ table) for every enumerate_sites call on just 2 tiles.
Replace with direct O(n*m) angle comparison matching, identical to the
optimized polyomino benchmark's compute_match. Snake validation is now
done per-site in the caller instead of per-unique-shape in TileSet.
Track positions and visited set in GrowingPatch for O(1) overlap checks.
Skip Snake validation for ZZ4 (vertex overlap is sufficient). Falls back
to Snake check for other rings where segment crossing is possible.
Replace Ratio<i64>-based cyclotomic positions with scaled integer tuples
(Pos2/Pos4/Pos8) for ZZ4, ZZ10, ZZ12. Eliminates cross-multiply + GCD
overhead from every position addition, hash, and comparison in the hot path.
…ZZ12

Mirror existing intersect/is_ccw/is_between/wedge geometry exactly using
scaled integer SR type instead of cyclotomic Ratio arithmetic. Only check
new segments against kept parent segments. Skip Snake::try_from for ZZ12.
The pentagonal rings Q(sqrt(5), sqrt(10-2*sqrt(5))) have degree-4 real subfields
with non-integer basis constants, so the existing signum_sum_sqrt_expr_4 (designed
for integer roots like ZZ24's {1,2,3,6}) cannot apply. The new algorithm recursively
reduces from Q(sqrt(5), sqrt(y)) to Q(sqrt(5)) by grouping as P + Q*sqrt(y) and
computing sign(P^2 - Q^2*y) via the existing signum_sum_sqrt_expr_2.
Gates the direct cyclotomic intersection implementation behind a feature flag,
keeping the integer Pos4 fast path as default. Also fixes trait resolution
ambiguity by explicitly qualifying < as SymNum>::turn().
- Restore code comments in Snake::add_unsafe that were accidentally removed
- Remove unnecessary u64 casts in pentagonal signum comparison
- Remove needless borrows on seq references
- Collapse nested if in snake validation check
- Group GrowState struct to reduce grow_recursive_inner args (8→7)
- Group ProfileStats struct to reduce polyomino grow_recursive args (9→7)
Gate pos4 intersection code with #[cfg(not(feature = "cyclotomic_intersect"))],
fix cfg-gated test imports, replace wildcard enum import, use i64::from for
lossless casts, fix needless_range_loop in tests, and fix polyomino test loop.
@apirogov apirogov merged commit 6eaa10a into main Apr 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant