Skip to content

SNF improvements#2627

Open
edgarcosta wants to merge 28 commits intoflintlib:mainfrom
edgarcosta:snf_improvements
Open

SNF improvements#2627
edgarcosta wants to merge 28 commits intoflintlib:mainfrom
edgarcosta:snf_improvements

Conversation

@edgarcosta
Copy link
Copy Markdown
Member

Building on #2596, this PR adds fmpz_mat_snf_transform (the feature request in flint2#204), fmpz_mat_elementary_divisors, and replaces the SNF dispatcher with a faster algorithm.

fmpz_mat_snf_transform(S, U, V, A) computes S = UAV where S is the Smith normal form and U, V are unimodular, for arbitrary m x n matrices. Either U or V (or both) may be NULL to skip computing the corresponding transform. The algorithm alternates row and column HNF until the matrix is diagonal, then fixes the divisibility chain with xgcd-based row/column operations (O(n) per pair rather than the O(n^3) matrix multiplications in the original acb_theta implementation by Jean Kieffer). This also removes ~100 lines of static duplicates from acb_theta/sp2gz_decompose.c.

fmpz_mat_elementary_divisors(ed, rank, A) computes d_1 | d_2 | ... | d_r without full SNF using Luebeck's algorithm: compute HNF, factor the product of the pivots, then for each prime p iteratively compute nullspace mod p to determine p-adic valuations. Falls back to full SNF if any prime factor does not fit in a ulong. This uses the new nmod_mat_left_nullspace, which computes the left nullspace directly by row-reducing [A | I] rather than transposing.

fmpz_mat_is_diagonal is promoted from a static function in acb_theta/sp2gz_decompose.c (which had a /* todo: move out? */ comment).

The fmpz_mat_snf dispatcher now uses the iterative Hermite algorithm instead of Kannan-Bachem/Iliopoulos, avoiding the cost of determinant computation and Iliopoulos' modular reduction. There is also a fast path for already-diagonal matrices (44-360x speedup).

Timings (us) for square nonsingular matrices, comparing the new dispatcher (snf), snf_transform without and with U/V, Kannan-Bachem (KB), Iliopoulos (Ilio), and elementary_divisors (ed). KB/Ilio skipped for n > 20, ed skipped for 200-bit entries where factorization dominates:

  n  bits     snf  transf  transf+UV        KB   Ilio      ed
  3    10       0       0          0        20      0       0
  3    50       0       0         20         0      0       0
  5    10      20       0          0         0     20       0
  5    50      20      20         40        20      0      40
  8    10       0      20          0        20      0       0
  8    50      20      40         40        60     40      60
  8   200      40      60         80       300     80       -
 10    10       0      20         20        60     20      40
 10    50      20      40         80      1260    100    5540
 10   200      80     160        220      1400    500       -
 15    10     100     100        100     28200    100     100
 15    50     100     200        200     75000    300    4500
 15   200     400     800       1100    680800   3500       -
 20    10     100     200        300   2049600    300     300
 20    50     200     400        500  20440800   1000    1300
 20   200    1200    2700       3100  37894700  10200       -
 30    10     333     667       1000         -      -    1000
 30    50     667    2000       2667         -      -    7333
 30   200    5000   14333      17667         -      -       -
 50    10   14000    5000       7000         -      -   10000
 50    50   16000   18000      23000         -      -   33000
 50   200   88000  142000     235000         -      -       -

The new dispatcher is 5-37000x faster than Kannan-Bachem and 3-9x faster than Iliopoulos at sizes 10-20. Passing NULL for U/V saves 15-40% at larger sizes. elementary_divisors is competitive for small-bit entries but depends on factorization.

Diagonal fast path (us):

   n  bits   new    old  speedup
  20    50     2     88      44x
  50   200    20   4220     211x
 100   200   100  35950     360x

Tests include Magma-verified hardcoded cases and randomized cross-checks between all five SNF implementations (dispatcher, Kannan-Bachem, Iliopoulos, snf_transform, elementary_divisors). Documentation added for all new functions.

Promote the static fmpz_mat_is_diagonal from acb_theta/sp2gz_decompose.c
(marked "todo: move out?") to a public function in the fmpz_mat module.
Skip the expensive HNF/Iliopoulos pipeline when the input is already
diagonal by detecting this case and calling fmpz_mat_snf_diagonal
directly. Benchmarks show 13x-360x speedup on diagonal matrices.
Compute S = U * A * V where S is the Smith normal form and U, V are
unimodular. Uses iterative Hermite normal form (alternating row and
column HNF until diagonal), then xgcd-based row/column operations to
fix the divisibility chain. Generalized to non-square matrices.

Based on Jean Kieffer's implementation in acb_theta, with Phase 2
optimized from O(n^3) to O(n) per pair via direct row/column updates.
Remove the static copies of fmpz_mat_is_diagonal and
fmpz_mat_snf_transform from acb_theta/sp2gz_decompose.c, now that
they are public functions in the fmpz_mat module.
Compute elementary divisors d1|d2|...|dr without full SNF using
Luebeck's algorithm: compute HNF, factor the product of leading
entries, then for each prime p iteratively find the left nullspace
mod p to determine p-adic valuations. Falls back to full SNF when
prime factors do not fit in ulong.
Benchmark fmpz_mat_snf across matrix sizes, bit sizes, and types
(square nonsingular, square singular, non-square).
Test edge cases (0x0, 0xn, mx0, 1x1, non-square diagonal) and
randomized tests for both diagonal and non-diagonal matrices.
…ivisors

Test cases generated with Magma's SmithForm and ElementaryDivisors,
covering structured matrices, prescribed divisibility chains,
singular matrices, non-square matrices, single row/column, and
negative diagonal entries.
t-snf: 5000 -> 3000 iterations
t-elementary_divisors: 3000 -> 2000 iterations
Replace the KB-vs-Iliopoulos cross-check with a comprehensive test
that compares all five methods on small square nonsingular matrices:
fmpz_mat_snf, snf_kannan_bachem, snf_iliopoulos, snf_transform, and
elementary_divisors.
The old dispatcher used Kannan-Bachem for small matrices and
det + Iliopoulos for larger ones. The iterative Hermite approach
(alternating row/column HNF until diagonal) is faster across the
board: 2-30x for square nonsingular matrices, with the biggest
gains at larger bit sizes.
- Allow U and/or V to be NULL in fmpz_mat_snf_transform to skip
  computing unneeded transformation matrices
- Replace bounded for-loop with while(!is_diagonal) in SNF Phase 1
  (convergence is mathematically guaranteed; FLINT_ASSERT is a no-op
  in release builds so a bounded loop with assert was unsafe)
- Move Xt/Mc allocation outside the Phase 1 loop in snf.c and
  snf_transform.c; defer Phase 2 temporaries to their scope
- Use fmpz_mat_window for R in elementary_divisors.c (avoids copy)
- Use _fmpz_vec_scalar_divexact_ui instead of element-by-element loop
- Add FLINT_ASSERT guards after Phase 1 and for pivot row search
- Extract _check_snf_rand helper in t-snf.c to deduplicate 4 test loops
- Refactor cross-check tests to loop over methods with a switch
- Add NULL-parameter tests for snf_transform (NULL U, NULL V, both)
- Add documentation for is_diagonal, snf_transform, elementary_divisors
Add nmod_mat_left_nullspace which computes the left nullspace by
row-reducing [A | I] directly, without transposing. Use it in
fmpz_mat_elementary_divisors to simplify the left nullspace computation.

Fix snf_transform.c Phase 2 indentation to properly nest inside its
scoping block, remove redundant fmpz_set in the U update loop, and
remove dead T1/T2 variables from the NULL-transform test.
@fredrik-johansson
Copy link
Copy Markdown
Collaborator

fredrik-johansson commented Apr 5, 2026

nmod_mat_left_nullspace seems to compute a reduced matrix of column vectors such that X^T A = 0, whereas nmod_mat_nullspace computes a non-reduced basis of column vectors such that A X = 0.

Wouldn't be more natural to return a matrix of row vectors such that X A = 0, since row matrices in any case are more convenient with our row-major storage? And why not return it non-reduced for users who don't need it to be reduced?

Also, this function can be implemented in terms of nmod_mat_nullspace plus transposition (and nmod_mat_rref if you want it reduced). This should be faster than augmenting the input matrix.

Maybe we should have a function nmod_mat_kernel that allows a combination of flags:

  • left/right kernel
  • transpose/not transpose (or: basis as rows/column vectors)
  • reduced/nonreduced basis
  • output zero-padded n x n matrix, or resize matrix to n x r / r x n

Not that this needs to be done in this PR, but it would at least be nice if nullspace/left_nullspace were consistent.

Comment thread src/fmpz_mat/elementary_divisors.c Outdated
}

void
fmpz_mat_elementary_divisors(fmpz * ed, slong * rank_out,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rank_out could be the return value instead of being passed as a pointer?

@fredrik-johansson
Copy link
Copy Markdown
Collaborator

I guess the 32-bit test failure is due to overflow in the hardcoded test case slong constant 3395595701880.

@edgarcosta
Copy link
Copy Markdown
Member Author

Thank you! I appreciate the feedback!
I will hopefully tackle all this soon.

Indeed, I first had the left nulsspace via transpose, but without thinking I thought I was being lazy, and rewrote it via the augmentation... but I can see its issues now.

@fredrik-johansson
Copy link
Copy Markdown
Collaborator

In fmpz_mat_elementary_divisors, why do you multiply together all the leading entries and call fmpz_factor on the big product? Surely it's better to factor each leading entry and merge the factors.

Also, since the factorisation isn't used when a prime doesn't fit in ulong, you could 1) abort early when you find a leading entry with a larger factor, 2) use fmpz_factor_smooth to avoid effectively hanging if there happen to be 1000-bit entries (and/or maybe just abort and fall back on SNF when you hit a number to factor that exceeds some bit limit, say 2*FLINT_BITS).

The hardcoded expected diagonal `3395595701880` overflows a 32-bit
`slong`, causing t-snf_transform to fail on 32-bit platforms. Change
the expected-SNF parameter of the helper to accept decimal strings
and build the target `fmpz_t` via `fmpz_set_str`, making every case
portable across 32/64-bit.
Changes addressing Fredrik Johansson's review on flintlib#2627:

* `nmod_mat_left_nullspace`: return row vectors (`X A = 0`) and a
  non-reduced basis, matching the convention of `nmod_mat_nullspace`.
  Implement via transpose + `nmod_mat_nullspace` instead of
  augmenting `[A | I]` and row-reducing, which is faster and reuses
  existing infrastructure.

* `fmpz_mat_elementary_divisors`: return the rank as the function
  value rather than via an out-parameter pointer.

* `fmpz_mat_elementary_divisors`: factor each HNF pivot individually
  rather than factoring their product. Use `fmpz_factor_smooth` with
  a `FLINT_BITS` smoothness bound and abort early (falling back to
  full SNF) as soon as a pivot has a prime factor that does not fit
  in a `ulong`. Guard against pathological inputs by also falling
  back when a pivot exceeds `2 * FLINT_BITS` bits, avoiding
  effectively unbounded factoring on very large entries. Collect the
  union of ulong-sized primes so each is passed to the Luebeck inner
  loop only once.

* The caller of `nmod_mat_left_nullspace` inside the Luebeck step is
  updated to allocate and index the nullspace matrix with rows as
  basis vectors, matching the new API.
…ary_divisors_p

FLINT_ASSERT is a no-op in release builds. The pivot-column identification
loop relies on nmod_mat_nullspace returning an RREF-derived basis where each
vector has a unique free-variable column; if that invariant ever breaks
(future refactor of src/nmod_mat/nullspace.c), the subsequent
fmpz_mat_row(R, -1) would be undefined behaviour. Fail loudly instead.
The Phase 1 alternating row/column HNF loop in fmpz_mat_snf and
fmpz_mat_snf_transform was unbounded after 4e5d379; if a future bug in
fmpz_mat_hnf or fmpz_mat_hnf_transform broke convergence the loop would hang
silently.  Add a defensive iteration ceiling derived from Kannan & Bachem
(1979, SIAM J. Comput. 8, Theorem 5): O(mmn^2 * log(mmn * ||A||)) HNF
passes, padded 4x and saturated at WORD_MAX.  Practical iteration counts
are orders of magnitude below this bound, so the throw only triggers on
true runaways.
The previous wording called the returned basis "(non-reduced)", which is
misleading because the backing nmod_mat_nullspace computes RREF and reads
the basis off from it.  State the actual structural property (unique
free-variable column per row) that downstream callers such as
fmpz_mat_elementary_divisors rely on.
State both fallback triggers explicitly: pivots above 2*FLINT_BITS bits,
and pivots whose smooth factorization at FLINT_BITS leaves a composite
cofactor.  The previous wording collapsed them into a single sentence that
was imprecise about which condition actually triggers the fallback.
Commit 3a2fab7 described two fallback triggers but the implementation
at src/fmpz_mat/elementary_divisors.c:284, :295, and :307 has three
distinct triggers: bit-size above 2*FLINT_BITS, a composite cofactor left
by fmpz_factor_smooth, and a prime factor exceeding ulong (which arises
separately because fmpz_factor_smooth may return 1 while still yielding a
probable-prime factor too large for ulong).  Name all three, and replace
the non-ASCII em-dash with a plain "--" for consistency with the rest
of doc/source/.
The Kannan-Bachem iteration bound was duplicated between snf.c and
snf_transform.c (added in c31b103).  Move it into its own file
so both callers share a single implementation.
… fmpz_mat_

CLAUDE.md calls for internal helpers to follow the _module_foo() naming
convention.  Rename _elementary_divisors_p and _elementary_divisors_via_snf
accordingly.
…tions

Move t-elementary_divisors after the t-det_* block in fmpz_mat/test/main.c
and move t-left_nullspace after t-invert_rows_cols in nmod_mat/test/main.c
(both the #include list and the TEST_FUNCTION array).
Spell out that S may alias A, while U and V must be distinct from A, from
S, and from each other (U aliasing A is unsafe because
_fmpz_mat_snf_iter_bound(A) reads A after U and V have been initialised).
fmpz_factor_smooth trial-divides only up to 15 bits and uses ECM above
that, so saying "trial-dividing primes of up to FLINT_BITS bits" misnamed
the mechanism.  Name fmpz_factor_smooth explicitly and describe the bound.
Smith normal form requires a non-negative diagonal; snf_transform negates
rows of U at the end to enforce it.  Mention this post-step alongside the
alternating HNF and divisibility-chain phases.
- Short-circuit fmpz_mat_snf_transform to fmpz_mat_snf when both U and
  V are NULL; this closes the transform-vs-no-transform gap benchmarked
  in the PR body.
- Replace hand-rolled U row-copy/update/negate loops in snf_transform
  with _fmpz_vec_set, _fmpz_vec_scalar_mul_fmpz,
  _fmpz_vec_scalar_addmul_fmpz, and _fmpz_vec_neg.
- Hoist nmod_mat_init/clear for Rp out of the Luebeck nullspace loop
  in elementary_divisors (single init/clear pair instead of per level).
- Drop the unused int section field from snf_params_t in p-snf.c.
- Drop a stale narration comment in acb_theta/sp2gz_decompose.c.
The prior wording described a KB/Iliopoulos dispatcher chosen "based
on the characteristics of the input matrix"; after the switch to a
single iterative Hermite algorithm with a diagonal fast path (commits
95fa305, ba6c4d5), that text no longer matched the code.
@edgarcosta
Copy link
Copy Markdown
Member Author

the commits got a bit out of hand, but to avoid rewriting history right now, I let them be.
At the end if you would like more than one commit, I can squash many of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants