SNF improvements by edgarcosta · Pull Request #2627 · flintlib/flint

edgarcosta · 2026-04-05T02:39:23Z

Building on #2596, this PR adds fmpz_mat_snf_transform (the feature request in flint2#204), fmpz_mat_elementary_divisors, and replaces the SNF dispatcher with a faster algorithm.

fmpz_mat_snf_transform(S, U, V, A) computes S = UAV where S is the Smith normal form and U, V are unimodular, for arbitrary m x n matrices. Either U or V (or both) may be NULL to skip computing the corresponding transform. The algorithm alternates row and column HNF until the matrix is diagonal, then fixes the divisibility chain with xgcd-based row/column operations (O(n) per pair rather than the O(n^3) matrix multiplications in the original acb_theta implementation by Jean Kieffer). This also removes ~100 lines of static duplicates from acb_theta/sp2gz_decompose.c.

fmpz_mat_elementary_divisors(ed, rank, A) computes d_1 | d_2 | ... | d_r without full SNF using Luebeck's algorithm: compute HNF, factor the product of the pivots, then for each prime p iteratively compute nullspace mod p to determine p-adic valuations. Falls back to full SNF if any prime factor does not fit in a ulong. This uses the new nmod_mat_left_nullspace, which computes the left nullspace directly by row-reducing [A | I] rather than transposing.

fmpz_mat_is_diagonal is promoted from a static function in acb_theta/sp2gz_decompose.c (which had a /* todo: move out? */ comment).

The fmpz_mat_snf dispatcher now uses the iterative Hermite algorithm instead of Kannan-Bachem/Iliopoulos, avoiding the cost of determinant computation and Iliopoulos' modular reduction. There is also a fast path for already-diagonal matrices (44-360x speedup).

Timings (us) for square nonsingular matrices, comparing the new dispatcher (snf), snf_transform without and with U/V, Kannan-Bachem (KB), Iliopoulos (Ilio), and elementary_divisors (ed). KB/Ilio skipped for n > 20, ed skipped for 200-bit entries where factorization dominates:

  n  bits     snf  transf  transf+UV        KB   Ilio      ed
  3    10       0       0          0        20      0       0
  3    50       0       0         20         0      0       0
  5    10      20       0          0         0     20       0
  5    50      20      20         40        20      0      40
  8    10       0      20          0        20      0       0
  8    50      20      40         40        60     40      60
  8   200      40      60         80       300     80       -
 10    10       0      20         20        60     20      40
 10    50      20      40         80      1260    100    5540
 10   200      80     160        220      1400    500       -
 15    10     100     100        100     28200    100     100
 15    50     100     200        200     75000    300    4500
 15   200     400     800       1100    680800   3500       -
 20    10     100     200        300   2049600    300     300
 20    50     200     400        500  20440800   1000    1300
 20   200    1200    2700       3100  37894700  10200       -
 30    10     333     667       1000         -      -    1000
 30    50     667    2000       2667         -      -    7333
 30   200    5000   14333      17667         -      -       -
 50    10   14000    5000       7000         -      -   10000
 50    50   16000   18000      23000         -      -   33000
 50   200   88000  142000     235000         -      -       -

The new dispatcher is 5-37000x faster than Kannan-Bachem and 3-9x faster than Iliopoulos at sizes 10-20. Passing NULL for U/V saves 15-40% at larger sizes. elementary_divisors is competitive for small-bit entries but depends on factorization.

Diagonal fast path (us):

   n  bits   new    old  speedup
  20    50     2     88      44x
  50   200    20   4220     211x
 100   200   100  35950     360x

Tests include Magma-verified hardcoded cases and randomized cross-checks between all five SNF implementations (dispatcher, Kannan-Bachem, Iliopoulos, snf_transform, elementary_divisors). Documentation added for all new functions.

Promote the static fmpz_mat_is_diagonal from acb_theta/sp2gz_decompose.c (marked "todo: move out?") to a public function in the fmpz_mat module.

Skip the expensive HNF/Iliopoulos pipeline when the input is already diagonal by detecting this case and calling fmpz_mat_snf_diagonal directly. Benchmarks show 13x-360x speedup on diagonal matrices.

Compute S = U * A * V where S is the Smith normal form and U, V are unimodular. Uses iterative Hermite normal form (alternating row and column HNF until diagonal), then xgcd-based row/column operations to fix the divisibility chain. Generalized to non-square matrices. Based on Jean Kieffer's implementation in acb_theta, with Phase 2 optimized from O(n^3) to O(n) per pair via direct row/column updates.

Remove the static copies of fmpz_mat_is_diagonal and fmpz_mat_snf_transform from acb_theta/sp2gz_decompose.c, now that they are public functions in the fmpz_mat module.

Compute elementary divisors d1|d2|...|dr without full SNF using Luebeck's algorithm: compute HNF, factor the product of leading entries, then for each prime p iteratively find the left nullspace mod p to determine p-adic valuations. Falls back to full SNF when prime factors do not fit in ulong.

Benchmark fmpz_mat_snf across matrix sizes, bit sizes, and types (square nonsingular, square singular, non-square).

Test edge cases (0x0, 0xn, mx0, 1x1, non-square diagonal) and randomized tests for both diagonal and non-diagonal matrices.

…ivisors Test cases generated with Magma's SmithForm and ElementaryDivisors, covering structured matrices, prescribed divisibility chains, singular matrices, non-square matrices, single row/column, and negative diagonal entries.

t-snf: 5000 -> 3000 iterations t-elementary_divisors: 3000 -> 2000 iterations

Replace the KB-vs-Iliopoulos cross-check with a comprehensive test that compares all five methods on small square nonsingular matrices: fmpz_mat_snf, snf_kannan_bachem, snf_iliopoulos, snf_transform, and elementary_divisors.

The old dispatcher used Kannan-Bachem for small matrices and det + Iliopoulos for larger ones. The iterative Hermite approach (alternating row/column HNF until diagonal) is faster across the board: 2-30x for square nonsingular matrices, with the biggest gains at larger bit sizes.

- Allow U and/or V to be NULL in fmpz_mat_snf_transform to skip computing unneeded transformation matrices - Replace bounded for-loop with while(!is_diagonal) in SNF Phase 1 (convergence is mathematically guaranteed; FLINT_ASSERT is a no-op in release builds so a bounded loop with assert was unsafe) - Move Xt/Mc allocation outside the Phase 1 loop in snf.c and snf_transform.c; defer Phase 2 temporaries to their scope - Use fmpz_mat_window for R in elementary_divisors.c (avoids copy) - Use _fmpz_vec_scalar_divexact_ui instead of element-by-element loop - Add FLINT_ASSERT guards after Phase 1 and for pivot row search - Extract _check_snf_rand helper in t-snf.c to deduplicate 4 test loops - Refactor cross-check tests to loop over methods with a switch - Add NULL-parameter tests for snf_transform (NULL U, NULL V, both) - Add documentation for is_diagonal, snf_transform, elementary_divisors

Add nmod_mat_left_nullspace which computes the left nullspace by row-reducing [A | I] directly, without transposing. Use it in fmpz_mat_elementary_divisors to simplify the left nullspace computation. Fix snf_transform.c Phase 2 indentation to properly nest inside its scoping block, remove redundant fmpz_set in the U update loop, and remove dead T1/T2 variables from the NULL-transform test.

fredrik-johansson · 2026-04-05T09:03:41Z

nmod_mat_left_nullspace seems to compute a reduced matrix of column vectors such that X^T A = 0, whereas nmod_mat_nullspace computes a non-reduced basis of column vectors such that A X = 0.

Wouldn't be more natural to return a matrix of row vectors such that X A = 0, since row matrices in any case are more convenient with our row-major storage? And why not return it non-reduced for users who don't need it to be reduced?

Also, this function can be implemented in terms of nmod_mat_nullspace plus transposition (and nmod_mat_rref if you want it reduced). This should be faster than augmenting the input matrix.

Maybe we should have a function nmod_mat_kernel that allows a combination of flags:

left/right kernel
transpose/not transpose (or: basis as rows/column vectors)
reduced/nonreduced basis
output zero-padded n x n matrix, or resize matrix to n x r / r x n

Not that this needs to be done in this PR, but it would at least be nice if nullspace/left_nullspace were consistent.

fredrik-johansson · 2026-04-05T10:01:11Z

+}
+
+void
+fmpz_mat_elementary_divisors(fmpz * ed, slong * rank_out,


rank_out could be the return value instead of being passed as a pointer?

fredrik-johansson · 2026-04-05T10:14:21Z

I guess the 32-bit test failure is due to overflow in the hardcoded test case slong constant 3395595701880.

edgarcosta · 2026-04-05T18:18:16Z

Thank you! I appreciate the feedback!
I will hopefully tackle all this soon.

Indeed, I first had the left nulsspace via transpose, but without thinking I thought I was being lazy, and rewrote it via the augmentation... but I can see its issues now.

fredrik-johansson · 2026-04-05T23:02:38Z

In fmpz_mat_elementary_divisors, why do you multiply together all the leading entries and call fmpz_factor on the big product? Surely it's better to factor each leading entry and merge the factors.

Also, since the factorisation isn't used when a prime doesn't fit in ulong, you could 1) abort early when you find a leading entry with a larger factor, 2) use fmpz_factor_smooth to avoid effectively hanging if there happen to be 1000-bit entries (and/or maybe just abort and fall back on SNF when you hit a number to factor that exceeds some bit limit, say 2*FLINT_BITS).

The hardcoded expected diagonal `3395595701880` overflows a 32-bit `slong`, causing t-snf_transform to fail on 32-bit platforms. Change the expected-SNF parameter of the helper to accept decimal strings and build the target `fmpz_t` via `fmpz_set_str`, making every case portable across 32/64-bit.

Changes addressing Fredrik Johansson's review on flintlib#2627: * `nmod_mat_left_nullspace`: return row vectors (`X A = 0`) and a non-reduced basis, matching the convention of `nmod_mat_nullspace`. Implement via transpose + `nmod_mat_nullspace` instead of augmenting `[A | I]` and row-reducing, which is faster and reuses existing infrastructure. * `fmpz_mat_elementary_divisors`: return the rank as the function value rather than via an out-parameter pointer. * `fmpz_mat_elementary_divisors`: factor each HNF pivot individually rather than factoring their product. Use `fmpz_factor_smooth` with a `FLINT_BITS` smoothness bound and abort early (falling back to full SNF) as soon as a pivot has a prime factor that does not fit in a `ulong`. Guard against pathological inputs by also falling back when a pivot exceeds `2 * FLINT_BITS` bits, avoiding effectively unbounded factoring on very large entries. Collect the union of ulong-sized primes so each is passed to the Luebeck inner loop only once. * The caller of `nmod_mat_left_nullspace` inside the Luebeck step is updated to allocate and index the nullspace matrix with rows as basis vectors, matching the new API.

…ary_divisors_p FLINT_ASSERT is a no-op in release builds. The pivot-column identification loop relies on nmod_mat_nullspace returning an RREF-derived basis where each vector has a unique free-variable column; if that invariant ever breaks (future refactor of src/nmod_mat/nullspace.c), the subsequent fmpz_mat_row(R, -1) would be undefined behaviour. Fail loudly instead.

The Phase 1 alternating row/column HNF loop in fmpz_mat_snf and fmpz_mat_snf_transform was unbounded after 4e5d379; if a future bug in fmpz_mat_hnf or fmpz_mat_hnf_transform broke convergence the loop would hang silently. Add a defensive iteration ceiling derived from Kannan & Bachem (1979, SIAM J. Comput. 8, Theorem 5): O(mmn^2 * log(mmn * ||A||)) HNF passes, padded 4x and saturated at WORD_MAX. Practical iteration counts are orders of magnitude below this bound, so the throw only triggers on true runaways.

The previous wording called the returned basis "(non-reduced)", which is misleading because the backing nmod_mat_nullspace computes RREF and reads the basis off from it. State the actual structural property (unique free-variable column per row) that downstream callers such as fmpz_mat_elementary_divisors rely on.

State both fallback triggers explicitly: pivots above 2*FLINT_BITS bits, and pivots whose smooth factorization at FLINT_BITS leaves a composite cofactor. The previous wording collapsed them into a single sentence that was imprecise about which condition actually triggers the fallback.

Commit 3a2fab7 described two fallback triggers but the implementation at src/fmpz_mat/elementary_divisors.c:284, :295, and :307 has three distinct triggers: bit-size above 2*FLINT_BITS, a composite cofactor left by fmpz_factor_smooth, and a prime factor exceeding ulong (which arises separately because fmpz_factor_smooth may return 1 while still yielding a probable-prime factor too large for ulong). Name all three, and replace the non-ASCII em-dash with a plain "--" for consistency with the rest of doc/source/.

The Kannan-Bachem iteration bound was duplicated between snf.c and snf_transform.c (added in c31b103). Move it into its own file so both callers share a single implementation.

… fmpz_mat_ CLAUDE.md calls for internal helpers to follow the _module_foo() naming convention. Rename _elementary_divisors_p and _elementary_divisors_via_snf accordingly.

…tions Move t-elementary_divisors after the t-det_* block in fmpz_mat/test/main.c and move t-left_nullspace after t-invert_rows_cols in nmod_mat/test/main.c (both the #include list and the TEST_FUNCTION array).

Spell out that S may alias A, while U and V must be distinct from A, from S, and from each other (U aliasing A is unsafe because _fmpz_mat_snf_iter_bound(A) reads A after U and V have been initialised).

fmpz_factor_smooth trial-divides only up to 15 bits and uses ECM above that, so saying "trial-dividing primes of up to FLINT_BITS bits" misnamed the mechanism. Name fmpz_factor_smooth explicitly and describe the bound.

Smith normal form requires a non-negative diagonal; snf_transform negates rows of U at the end to enforce it. Mention this post-step alongside the alternating HNF and divisibility-chain phases.

- Short-circuit fmpz_mat_snf_transform to fmpz_mat_snf when both U and V are NULL; this closes the transform-vs-no-transform gap benchmarked in the PR body. - Replace hand-rolled U row-copy/update/negate loops in snf_transform with _fmpz_vec_set, _fmpz_vec_scalar_mul_fmpz, _fmpz_vec_scalar_addmul_fmpz, and _fmpz_vec_neg. - Hoist nmod_mat_init/clear for Rp out of the Luebeck nullspace loop in elementary_divisors (single init/clear pair instead of per level). - Drop the unused int section field from snf_params_t in p-snf.c. - Drop a stale narration comment in acb_theta/sp2gz_decompose.c.

The prior wording described a KB/Iliopoulos dispatcher chosen "based on the characteristics of the input matrix"; after the switch to a single iterative Hermite algorithm with a diagonal fast path (commits 95fa305, ba6c4d5), that text no longer matched the code.

edgarcosta · 2026-04-18T18:27:05Z

the commits got a bit out of hand, but to avoid rewriting history right now, I let them be.
At the end if you would like more than one commit, I can squash many of them.

edgarcosta added 13 commits April 4, 2026 19:47

Add fmpz_mat_is_diagonal

bb74375

Promote the static fmpz_mat_is_diagonal from acb_theta/sp2gz_decompose.c (marked "todo: move out?") to a public function in the fmpz_mat module.

Add diagonal fast path to fmpz_mat_snf

95fa305

Skip the expensive HNF/Iliopoulos pipeline when the input is already diagonal by detecting this case and calling fmpz_mat_snf_diagonal directly. Benchmarks show 13x-360x speedup on diagonal matrices.

Use public fmpz_mat_is_diagonal and fmpz_mat_snf_transform in acb_theta

714085f

Remove the static copies of fmpz_mat_is_diagonal and fmpz_mat_snf_transform from acb_theta/sp2gz_decompose.c, now that they are public functions in the fmpz_mat module.

Add SNF profiling program

ddb0d7e

Benchmark fmpz_mat_snf across matrix sizes, bit sizes, and types (square nonsingular, square singular, non-square).

Add tests for fmpz_mat_is_diagonal

9c2da82

Test edge cases (0x0, 0xn, mx0, 1x1, non-square diagonal) and randomized tests for both diagonal and non-diagonal matrices.

Reduce SNF test iterations to stay under 1 second

eebda97

t-snf: 5000 -> 3000 iterations t-elementary_divisors: 3000 -> 2000 iterations

Cross-check all SNF implementations against each other

8256f0b

Replace the KB-vs-Iliopoulos cross-check with a comprehensive test that compares all five methods on small square nonsingular matrices: fmpz_mat_snf, snf_kannan_bachem, snf_iliopoulos, snf_transform, and elementary_divisors.

fredrik-johansson reviewed Apr 5, 2026

View reviewed changes

fredrik-johansson mentioned this pull request Apr 7, 2026

Smith Normal Form Options flintlib/python-flint#387

Open

edgarcosta added 11 commits April 16, 2026 13:07

refactor(fmpz_mat): extract _fmpz_mat_snf_iter_bound helper

b922ee6

The Kannan-Bachem iteration bound was duplicated between snf.c and snf_transform.c (added in c31b103). Move it into its own file so both callers share a single implementation.

refactor(fmpz_mat): prefix static helpers in elementary_divisors with…

324aa26

… fmpz_mat_ CLAUDE.md calls for internal helpers to follow the _module_foo() naming convention. Rename _elementary_divisors_p and _elementary_divisors_via_snf accordingly.

test(fmpz_mat, nmod_mat): restore alphabetical order of test registra…

e27d07b

…tions Move t-elementary_divisors after the t-det_* block in fmpz_mat/test/main.c and move t-left_nullspace after t-invert_rows_cols in nmod_mat/test/main.c (both the #include list and the TEST_FUNCTION array).

docs(fmpz_mat): document aliasing contract of snf_transform

bb286a0

Spell out that S may alias A, while U and V must be distinct from A, from S, and from each other (U aliasing A is unsafe because _fmpz_mat_snf_iter_bound(A) reads A after U and V have been initialised).

edgarcosta added 4 commits April 17, 2026 23:02

docs(fmpz_mat): clarify factoring mechanism in elementary_divisors

cc334bd

fmpz_factor_smooth trial-divides only up to 15 bits and uses ECM above that, so saying "trial-dividing primes of up to FLINT_BITS bits" misnamed the mechanism. Name fmpz_factor_smooth explicitly and describe the bound.

docs(fmpz_mat): describe sign-fixing step of snf_transform

34dca06

Smith normal form requires a non-negative diagonal; snf_transform negates rows of U at the end to enforce it. Mention this post-step alongside the alternating HNF and divisibility-chain phases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SNF improvements#2627

SNF improvements#2627
edgarcosta wants to merge 28 commits intoflintlib:mainfrom
edgarcosta:snf_improvements

edgarcosta commented Apr 5, 2026

Uh oh!

fredrik-johansson commented Apr 5, 2026 •

edited

Loading

Uh oh!

fredrik-johansson Apr 5, 2026

Uh oh!

fredrik-johansson commented Apr 5, 2026

Uh oh!

edgarcosta commented Apr 5, 2026

Uh oh!

fredrik-johansson commented Apr 5, 2026

Uh oh!

edgarcosta commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

edgarcosta commented Apr 5, 2026

Uh oh!

fredrik-johansson commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fredrik-johansson Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

fredrik-johansson commented Apr 5, 2026

Uh oh!

edgarcosta commented Apr 5, 2026

Uh oh!

fredrik-johansson commented Apr 5, 2026

Uh oh!

edgarcosta commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fredrik-johansson commented Apr 5, 2026 •

edited

Loading