Skip to content

Add kernel type-checker#375

Merged
johnchandlerburnham merged 35 commits intomainfrom
jcb/kernel
May 7, 2026
Merged

Add kernel type-checker#375
johnchandlerburnham merged 35 commits intomainfrom
jcb/kernel

Conversation

@johnchandlerburnham
Copy link
Copy Markdown
Member

new kernel approach, closing #300

  Wire the Lean 4 kernel (type-checker, WHNF reduction, definitional
  equality, inductive checking) into the main codebase as `ix::kernel`,
  ported from ix_old. This is the foundation for aux_gen, which will
  regenerate canonical auxiliary constants (.rec, .below, .brecOn) for
  alpha-collapsed and SCC-split mutual blocks.

  Restructure ConstantMeta to support this:
  - Rename the flat enum to ConstantMetaInfo, wrap it in a ConstantMeta
    struct with an `info` field (matching the kernel's expected interface)
  - Add ConstantMetaInfo::Muts variant for mutual block equivalence classes
  - Add resolve_kvmap for kernel ingress of KVMap metadata
  - Add Named.original field for aux_gen roundtrip decompilation
  - Add Address::from_hex for kernel test fixtures

  Update all downstream consumers (compile, decompile, FFI, serialization)
  to use the new ConstantMeta wrapper and ConstantMetaInfo enum. Add
  mutual inductive test fixtures in Tests/Ix/Compile/Mutual.lean covering
  alpha-collapse, over-merge, and SCC-split patterns.
  Extract the compile_env work-stealing scheduler from compile.rs into
  compile/env.rs (Phase 2b-2c from TODO.md). The new scheduler uses
  idempotent dependency tracking via Mutex<FxHashSet<Name>> instead of
  AtomicUsize counters, preventing silent double-decrement corruption.
  Adds a DashSet<Name> guard against duplicate block processing, and
  drains aux_gen_extra_names after each block to unblock dependents of
  "bonus" names compiled during parent inductive blocks.

  Extend CompileState with fields needed for aux_gen (Phase 2a): kenv,
  kintern, ungrounded, aux_gen_extra_names, aux_name_to_addr. Add
  resolve_addr / resolve_addr_aux for two-tier name resolution (compiled
  names first, aux_gen fallback second). Change blocks field from
  DashSet<Address> to DashMap<Name, Vec<Vec<Name>>> to store canonical
  class ordering for downstream aux_gen use.

  Resolve all warnings under cargo xclippy -D warnings:

  - Replace ~100 u64-to-usize casts with fallible u64_to_usize() helper
    for future 32-bit target support
  - Collapse identical match arms in congruence, ingress, def_eq, check
  - Remove unused imports, variables, unnecessary mut across kernel
  - Fix map().unwrap_or() → map_or() / is_some_and() patterns
  - Rename to_node → node_for_key, to_ctor_when_k → synth_ctor_when_k
    for clippy self-convention compliance
  - Convert loop/match/break → while let in expr, tc, inductive
  - Change add_eq_axioms(&mut KEnv) → (&KEnv) since DashMap is
    interior-mutable
  - Remove unnecessary path qualifications throughout kernel
  - Add crate-level allows for type_complexity, too_many_arguments,
    unnecessary_wraps
  When sort_consts collapses N mutual inductives into fewer equivalence
  classes, Lean's auto-generated auxiliaries (.rec, .recOn, .casesOn,
  .below, .brecOn, .noConfusion) have the wrong arity. Rather than
  surgically patching them, this commit regenerates them from the
  canonical class structure, producing identical output regardless of
  source declaration order.

  Core additions:
  - src/ix/compile/aux_gen/ — Full auxiliary regeneration module with
    submodules for each auxiliary kind: recursor (3.5k lines, following
    lean4/src/kernel/inductive.cpp), below, brecOn, casesOn, recOn,
    noConfusion, plus expr_utils for FVar-based intermediate computation
    and nested occurrence detection
  - src/ix/compile/mutual.rs — Orchestrates aux_gen output into Ixon
    blocks via compile_aux_block and generate_and_compile_aux_recursors
  - src/ix/congruence.rs — Alpha-equivalence checker between Lean
    expressions/constants, used to verify aux_gen output matches Lean's
    native constants

  Scheduler and compilation changes:
  - Pre-compile PUnit, PProd, Eq, True into aux_name_to_addr so aux_gen
    can reference them before the main scheduler processes their blocks
  - Add compile_const_no_aux path to capture original Lean form; promote_aux
    moves aux_gen constants to name_to_addr while storing the original
    (addr, meta) in Named.original for decompilation roundtrip
  - CompileError::MissingConstant now carries a `caller` field; scheduler
    prints full dependency status on failure for easier debugging
  - BlockCache.compiling tracks current constant name for error context

  Soundness fix:
  - Remove addr_to_name reverse index from Env — it was unsound for
    alpha-equivalent constants where multiple names map to the same
    Expr::Ref nodes instead of silently falling back to reverse lookup

  Decompiler hardening:
  - Remove fallback resolution paths that silently masked metadata
    mismatches; missing Ref metadata is now a hard error
  - Expand decompilation support for aux_gen-produced constants

  Utilities and infrastructure:
  - Name: add NameComponent, components(), strip_prefix(),
    append_components(), last_str() for aux name manipulation
  - Expr::pretty() for debugging expression trees
  - ConstantMetaInfo::kind_name() diagnostic helper

  Testing:
  - New ValidateAux.lean test with 6-phase validation: compilation,
    no ephemeral leaks, alpha-equivalence canonicity, decompilation
    with/without debug info, aux congruence verification
  - Mutual.lean: make all declarations `public` for cross-module access
  - rs_compile_validate_aux FFI entry point with phased logging
  - Restructure rs_tmp_decode_const_map into phased output

  Also applies cargo fmt and clippy fixes across kernel files (mode.rs,
  inductive.rs, congruence.rs, egress.rs, env.rs, tutorial/basic.rs,
  meta.rs).
…he keys

  Moves all shared type-checker state — intern table, WHNF/infer/def-eq/
  ingress/recursor caches, and resolved primitives — from TypeChecker into
  KEnv. TypeChecker becomes a lightweight thread-local handle (`TypeChecker<M>`
  instead of `TypeChecker<'env, M>`) that holds only local context, equiv
  manager, and control flags, making it cheap to create and safe to run in
  parallel over a shared `Arc<KEnv>`.

  Replaces pointer-based `ptr_key()` cache keys with blake3 `hash_key()` keys
  throughout WHNF, def-eq, infer, and ingress caches, fixing the ABA problem
  where deallocated Arc pointers could alias semantically different expressions.
  All caches now use DashMap/DashSet for lock-free concurrent access.

  Also:
  - Add KernelCtx wrapper for compile-side aux_gen sort-level inference
  - Extend aux_gen and decompile to handle nested _N-suffixed auxiliaries
    (rec_N, below_N, brecOn_N, etc.)
  - Improve ingress error handling: universe index, blob, and UTF-8 errors
    are now explicit instead of silently defaulting
  - Add Display impl for TcError and is_explicit() for KUniv
1. **Call-site surgery for alpha-collapsed recursors.** When
   `sort_consts` collapses mutual inductives into fewer equivalence
   classes, `.rec` gets regenerated with canonical motive/minor
   layout, but user code still applies it with source-order arguments.
   Add a surgery pipeline that reorders arguments at compile time
   and reconstructs the source-order App spine at decompile time.

   - New `src/ix/compile/surgery.rs`: `CallSitePlan` (per-recursor
     permutation + keep mask), `collect_{lean,ixon}_telescope`
     helpers, and `compute_call_site_plans` derivation.
   - New `ExprMetaData::CallSite { name, entries }` arena node
     carrying source-order `Kept { canon_idx, meta }` /
     `Collapsed { sharing_idx, meta }` entries at the outermost
     App position.
   - Extend `ConstantMeta` with `meta_sharing`/`meta_refs`/
     `meta_univs` extension tables forming a virtual address space
     appended to the block cache during decompile. Serialization is
     always-on (zero-length for pre-surgery constants).
   - `compile_expr` adds a `BuildCallSite` frame that splits
     telescopes into kept/collapsed args and emits a normal App
     spine with CallSite metadata; `decompile_expr` adds a
     `BuildTelescope` frame that walks entries in source order,
     resolving kept positions from the canonical spine and
     collapsed positions from the extension tables.

2. **aux_gen scope reduction and correctness fixes.** Empirically
   confirmed (via 25k+ constant validate-aux roundtrip) that only
   auxiliaries whose *value* references `.rec` directly need
   regeneration. `.casesOn`'s public binder arity is invariant under
   alpha collapse, so `.noConfusion`, `.ctorIdx`, `.ctor.inj*`,
   `._sizeOf_*`, etc. correctly bind to the regenerated `.casesOn`
   at address-resolution time with no patching.

   - Delete `src/ix/compile/aux_gen/no_confusion.rs` and drop the
     broken `.noConfusion` regeneration attempt from `mutual.rs`.
     Remove unused `PatchedConstant::_NoConfusion{,Type}` variants.
   - Expand the aux_gen module docs into a full taxonomy of which
     auxiliaries need regeneration vs. inherit correctness.
   - Add nested-inductive expand/restore model (`ExpandedBlock`,
     `generate_recursors_from_expanded`) so recursors for
     inductives with nested occurrences (e.g., `Array (Part α)`)
     build uniformly against auxiliary `_nested.*` consts and
     restore the originals afterwards.
   - Propagate `is_reflexive` on `BelowIndc` so `.below` content
     hashes for reflexive inductives like `Acc` match Lean's.
   - Add `instantiate_rev` matching Lean C++ semantics for
     multi-arg reverse instantiation.
   - New `src/ix/compile/nat_conv.rs` with `try_nat_to_usize` /
     `nat_to_usize` / `nat_to_u64` to make Lean-Nat-to-Rust-int
     conversions explicit rather than silently producing 0 on
     overflow. Threaded through all aux_gen modules.
   - Harden `promote_aux`: the name-mismatch branch now returns
     `CompileError::InvalidMutualBlock` instead of just logging —
     silently continuing would splice foreign metadata into the
     wrong Named entry.

3. **Env behind Arc<ConstantInfo>, with ref-graph precompile of
   aux_gen prereqs.** Kernel caches were lifted into `Arc<KEnv>`
   in the prior commit; this finishes the migration by changing
   `Env::get` to return `Option<Arc<ConstantInfo>>` rather than
   borrowed refs. Updated call sites in ground.rs (remove
   `GroundError<'a>` lifetime), graph.rs, decompile.rs, ffi/,
   aux_gen/, and throughout. Also fixes a nondeterministic
   `MissingConstant` race in the compile scheduler:

   - `precompile_aux_gen_prereqs` in `compile/env.rs` computes the
     transitive SCC closure of aux_gen seed names (PUnit, PProd,
     Eq{.refl,.symm,.ndrec}, rfl, HEq{,.refl}, eq_of_heq, True)
     and compiles them in reverse-topo order into
     `aux_name_to_addr` before any block's aux_gen fires. Prevents
     work-stealing races where `.brecOn.eq`'s emitted `Eq.symm`
     ref could race with `Eq`'s own SCC.
   - Any prereq compile failure is now a hard error (was silent
     fallback, which left names unresolved and reintroduced the
     race).
   - Scheduler uses `notify_one` per completion and `notify_all`
     only at total completion — removes thundering herd on every
     block.
   - Don't register failed aux_gen names in
     `aux_gen_extra_names` — downstream should get
     `MissingConstant` rather than silently referencing broken
     data.

Additional correctness and diagnostics:

- Gate slow-block / aux_gen timing diagnostics on `IX_TIMING`
  env var (default off).
- Add `IX_CONGRUENCE_DUMP` env var in the congruence patch-check
  to dump generated vs. original types/values on mismatch,
  optionally filtered by name substring.
- `Named` gains `name_refs: Vec<Vec<Name>>` parallel to
  `Constant.refs`, populated by every compile path. Currently
  reserved — the decompiler resolves Ref names via the arena's
  `name_addr` metadata (content-hashed, so alpha-collapse is
  already disambiguated). Field kept as a schema-stable
  extension point; documented accordingly.
- `decompile_expr` replaces `arena.nodes.get(i).unwrap_or(&Leaf)`
  with `arena_lookup` that accepts `u64::MAX` as the "no
  metadata" sentinel (returns Leaf) but errors with
  `BadConstantFormat` on any other out-of-bounds index. Arena
  corruption now fails loudly instead of silently stripping
  metadata. Add `pop_result` helper for the same reason on
  result-stack underflow.
- Sharing analyzer: `analyze_block` now returns topo_order as a
  third tuple element, threaded through `decide_sharing`,
  `build_sharing_vec`, and `analyze_sharing_stats` instead of
  each re-sorting. `rewrite_expr` checks `hash_to_idx` before
  the cache so stale cache entries don't block sharing
  replacement — removes the need for two `cache.clear()` calls
  in `build_sharing_vec`.
- Fix `decode_const_map` patch validation: `num_indices` was
  hardcoded to 1 and `is_reflexive` to false; both now come from
  the decoded `BelowIndcVal`.
- Clean up dead `_mk_unit_type` / `_mk_unit_val` Prop helpers in
  cases_on.rs (only PUnit path is live).

Tests:

- `ValidateAux.lean`: extend test scope to `State` and `Lean`
  prefixes for wider aux_gen coverage.
  Makes `Benchmarks/Compile/CompileMathlib.lean` pass end-to-end through the
  validator by closing every remaining aux_gen hash divergence and wiring
  up the two diagnostic roundtrips + the `ix validate` CLI they share.

  ### New: `ix validate --path <file>` subcommand

  Runs the full 8-phase aux_gen validation pipeline (compile + decompile +
  roundtrip + alpha-equivalence + nested-detect checks) on any Lean file,
  reusing the same Rust FFI (`rs_compile_validate_aux`) the `validate-aux`
  test runner already calls. Separate from `ix compile` because validation
  is expensive and primarily a correctness gate; separate from `lake test`
  so Mathlib-scale validation doesn't force the test binary to rebuild on
  every Mathlib update. Supports `--ns Aesop,SetTheory.PGame` namespace
  filtering with transitive-dep closure.

  Moves `buildFile` / `fetchMathlibCache` from `Cli/CompileCmd.lean` to
  `Ix/Meta.lean` so both CLI entry points share the same Mathlib-cache
  bootstrap, and drops `test-ffi` gating on `rs_compile_validate_aux` so the
  production `ix` binary can call it.

  ### New: two roundtrip diagnostic tests

  - `kernel-ixon-roundtrip` (replaces the old `kernel-roundtrip`): now runs
    `Lean → compile → ixon_ingress → KEnv → ixon_egress → decompile_env →
    Lean` rather than a second ad-hoc `KEnv → Lean` decompiler. Passing
    through the validated `decompile_env` lets aux_gen auxiliaries
    (`.brecOn*`, `.below`, `.brecOn_N.eq`) regenerate from the
    kernel-canonicalized Ixon form — closing the binder-name / alpha-
    collapse drift the old direct path couldn't see.
  - `kernel-lean-roundtrip` (new): `Lean → lean_ingress → KEnv → lean_egress
    → Lean`, skipping compile + Ixon entirely. Used to bisect between
    compile-pipeline bugs and ingress/egress bugs — if this is clean but
    `kernel-ixon-roundtrip` fails, compile is losing info.

  ### New: `ixon_egress` (`src/ix/kernel/egress.rs`)

  Inverse of `ixon_ingress`: `KEnv<Meta> + original IxonEnv → IxonEnv'`.
  Reuses `apply_sharing_*` from the compile side so the output is ready for
  `decompile_env`, and preserves each `Named.original` so aux_gen entries
  survive the roundtrip. Parallelized over DashMap partitions (muts blocks
  vs. standalone). Meta-only by design — generalizing to `Anon` requires
  address-keyed lookups.

  ### New: `lean_ingress` (`src/ix/kernel/ingress.rs`)

  Direct `LeanEnv → KEnv<Meta>`, bypassing `compile_env` + `ixon_ingress`.
  Uses `lean_name_to_addr` for every `KId.addr`, populates `kenv.intern`
  in-place, and emits `kenv.blocks` only for mutuals with >1 member. Only
  needed by the bisecting `kernel-lean-roundtrip` test.

  ### Aux_gen correctness

  - **Unsafe propagation** — plumbed `is_unsafe` through `.rec`, `.below`
    (Type + Prop), `.brecOn` / `.brecOn.go` / `.brecOn.eq`, `.casesOn`,
    `.recOn`. Matches Lean's `mkDefinitionValInferringUnsafe` +
    `mkThmOrUnsafeDef`: unsafe `.brecOn.eq` flips from `Thm` to unsafe
    `Defn` with `Opaque` hints; all type-level aux get `Unsafe`/`Opaque`
    safety whenever the parent inductive is unsafe. Previously hardcoded
    `Safe`, breaking content hashes on unsafe inductives.
  - **Kernel WHNF in `find_rec_target` / IH builders** — `build_minor_type`,
    `build_ih_type_fvar`, `build_rule_ih_fvar`, and `find_rec_target` now
    delta-unfold field-domain heads via `TcScope::whnf_lean` instead of
    pure `beta_reduce`. Matches Lean's `kernel/inductive.cpp::
    is_rec_argument`; fixes `reduceCtorParam*` regressions where an
    inductive appears under a definition head like `constType (n α) (n α)`.
  - **K-target via WHNF-reduced sort** — `compute_is_large_and_k` uses the
    kernel-derived `is_prop` rather than syntactic `peek_result_sort`, so
    reducible-alias target types like `Presieve X := ∀ Y, (Y ⟶ X) → Prop`
    correctly qualify for K.
  - **`.brecOn.eq` indexed-Eq construction** — per-index sort levels
    computed via `TcScope::get_level` (was hardcoded `Sort 1`); per-index
    `Eq` vs `HEq` decided via `TcScope::is_def_eq` matching Lean's
    `mkEqAndProof`; dependent index types substitute `outer_idx → new_idx`;
    major's sort level threaded explicitly. Fixes `TRBTree.brecOn.eq`,
    `Quiver.Path.brecOn.eq`, `PGame.Relabelling.brecOn.eq`,
    `Monoid.CoprodI.NeWord.brecOn.eq`, `NFA.Path.brecOn.eq`, and similar
    indexed-inductive cases.
  - **`level_normalize`** — ported from Lean's `Level.normalize`
    (`Lean/Level.lean:379-401`), applied by `TcScope::get_level` before
    returning sort levels for forall types. Matches `inferForallType`;
    without it, level trees stayed in our kernel's `mkLevelMax'` local-simp
    form and produced alpha-equivalent-but-not-hash-equal PProd level args
    (e.g., `SetTheory.PGame.brecOn.go` d=9 PProd.mk.lvl[1]).
  - **Nested-inductive false-positive** — `try_detect_nested_fvar` no longer
    matches against `flat`-stored external aux names (`Array`, `Option`,
    …). Matching only against the block's original inductives mirrors the
    kernel's `is_nested_inductive_app`; the old behavior flagged innocent
    occurrences (e.g., `Option (Array Script.LazyStep)` inside
    `Aesop.RappData`) and cascaded into phantom `_nested.Option_N` /
    `.rec_N+1` / `.below_N+1` / `.brecOn_N+1` constants.

  ### Kernel ingress/egress

  - **`RecRule.ctor` metadata field** — added `ctor: M::MField<Name>` to
    kernel `RecRule` so ingress ↔ egress roundtrips the Lean rule's ctor
    name. Unused by the kernel itself (dispatch is positional via `cidx`);
    erased in `Anon` mode. Threads through `ingress_recursor`,
    `egress_constant`, and every recursor-building site in
    `kernel/inductive.rs`, including all tutorial tests.
  - **Mdata preservation** — `lean_expr_to_zexpr_raw` accumulates consecutive
    `Mdata(kv, _)` layers into a single `Vec<MData>` attached via the
    `_mdata` constructors. Previously silently discarded, which lost every
    `_recApp` / `_inaccessible` / `noImplicitLambda` / `borrowed` /
    `sunfoldMatch` / `save_info` annotation. `kernel-lean-roundtrip`
    guards against regressing this.
  - **Binder-name tracking** — `lean_expr_to_zexpr_raw` now threads a
    `binder_names` stack so `ExprData::Var`'s display name is populated by
    de Bruijn lookup (cosmetic for pretty-printing; doesn't affect
    type-checking).
  - **`resolve_all` correctness** — errors on missing Named entries instead
    of synthesizing a name-hash fallback. The old fallback produced ghost
    KConsts (KIds referring to addresses where no KConst was ever stored),
    causing obscure downstream lookup failures and alpha-collapse
    confusion. `ingress_muts_inductive` also requires per-ctor Named
    metadata to avoid synthesizing junk binder names from missing arenas.
  - **Nat blob address fix** — uses `to_le_bytes()` (full BigInt width)
    rather than `to_u64().unwrap_or(0).to_le_bytes()` which truncated values
    ≥ 2⁶⁴ to 0, hash-consing distinct Nats to the same KExpr.
  - **Rename**: `ixon_to_zenv → ixon_ingress`, `egress_env → lean_egress`.
    Disambiguates the four ingress/egress directions now that the Ixon and
    Lean variants coexist.

  ### Mathlib-scale memory + parallelism

  - **`Env::put` streaming** — collect only keys up front, `par_sort_unstable`,
    look up each value via `DashMap::get` in the write loop so only one
    value is live beyond the DashMap's own storage. Saves ~30 GB peak RSS
    on Mathlib (Section 4 `Named` had the clone-into-Vec dominating peak).
    `topological_sort_names` keeps the tuple-clone path: benchmarked 22s
    faster than keys-only DFS because the Arc parent chain avoids 4.7M
    shard lookups.
  - **`validate-aux` parallelism** — Phases 1, 3, 4, 6, 7b, and Phase 2's
    aux_gen+alpha step all run via `rayon::par_iter` with atomic
    pass/fail counters and a 20-entry mutex-guarded failure sample. Phase
    2 is restructured into three passes (serial collect → serial
    pre-ingress → parallel aux_gen) because `p2_kctx` can't be populated
    concurrently with aux_gen's reads.
  - **Phase 7 memory accounting** — `std::mem::take(&mut stt.env)` extracts
    the Ixon env before dropping the rest of stt + dstt in parallel via
    `rayon::join`; serialize's 3 GB buffer dropped immediately after
    deserialize; `fresh_stt` destructor offloaded to a background thread
    so Phase 7b's parallel scan doesn't wait on it.
  - **CLI-path destructor skip** — `rs_compile_env` and
    `rs_compile_validate_aux` `std::mem::forget` their final state by
    default (escape hatch: `IX_SKIP_DROPS=0`). On Mathlib this trims 60–90
    seconds of shard-by-shard `Arc<NameData>` refcount chains at process
    exit, where the OS reclaims the pages immediately anyway.

  ### Removed: `Named.name_refs`

  The `Vec<Vec<Name>>` per-address name table was populated by every
  compile path but never read by the decompiler — the arena's
  `ExprMetaData::Ref { name }` already distinguishes alpha-collapsed Refs
  via its name-content hash. Deletes the field, its `with_name_refs`
  builder, every compile-site population, and the two `name_refs`
  serialization loops. Small serialized-size win; no behavioral change.
…rant scheduler

  This pass (1) closes three P1 soundness gaps in the kernel, (2) splits the
  compile-time kernel context into pristine-originals and canonical halves
  so original-constant verification is fully isolated from the aux_gen
  pipeline, (3) adds native Int reduction parallel to Nat, (4) makes
  compile_env tolerant of per-block failure so one bad inductive doesn't
  stop the batch, and (5) wires a full-environment kernel-check FFI
  (`CheckError = kernelException | compileError`) that lets Lean-side tests
  distinguish compile-side rejections from kernel-side rejections.

  ### P1 soundness gaps — closed

  - **P1-1 recursor rule verification** (`src/ix/kernel/inductive.rs`): the
    "both gen and stored empty → error" guard was spuriously rejecting
    `Empty.rec` / `False.rec` / `PEmpty.rec`. Zero-rule agreement is
    *vacuous equality*, not a generator failure. Replaced with an
    element-wise `is_def_eq(&gen_rule.rhs, &stored_rule.rhs)` gate
    (also checks `fields` count); the one-sided `is_empty` branches
    remain as honest mismatches. Regression test
    `reject_bool_rec_with_swapped_rules` exercises the defeq gate with
    type-correct-but-semantically-swapped minor bodies.

  - **P1-2 mutual peer agreement** (same file): `check_inductive` now
    enforces S3b — all peers must share parameter count AND
    parameter-domain types, verified by a new `check_param_agreement`
    walker that whnf-peels n_params foralls on both sides and
    `is_def_eq`s each domain. Without this, `build_rec_type` would take
    the shared param prefix uniformly from `ind_infos[0]` and produce
    generated recursors whose param binders misalign with a peer's ctor
    arguments — de-Bruijn-shifted iota, ill-typed stored terms. The
    check is memoized on successful block completion in the new
    `KEnv::block_peer_agreement_cache` (DashSet<KId>): peer agreement
    is transitive, so O(N²) naive per-peer becomes O(N) per block.
    References lean4 `check_inductive_types` and lean4lean's `Inductive/Add`.
    Regression tests cover mismatched domains, mismatched counts, and a
    happy-path sanity check.

  - **P1-3 universe substitution out-of-range** (`tc.rs`, `error.rs`):
    `subst_univ` / `instantiate_univ_params` now return `Result<_,TcError>`
    and a new `TcError::UnivParamOutOfRange { idx, bound }`. Previously
    any `Param(i)` with `i >= us.len()` silently produced an orphan
    `Param` node. Arity is validated upstream (at `infer`'s Const node);
    this is defense-in-depth so any code path reaching substitution
    without that check fails loudly. All ~10 internal call sites
    threaded with `?`. Regression test
    `subst_univ_rejects_out_of_range_param`.

  ### Two-env split: `KernelCtx { kenv, orig_kenv }`

  Add a second, pristine kernel environment populated once via
  `lean_ingress(&lean_env)` at `compile_env` startup and never mutated.
  Holds every Lean-original constant with all type references
  self-consistent — no alpha-collapse, no aux rewriting, no staleness.
  The existing `kctx.kenv` remains the incrementally-populated canonical
  env.

  - `mutual.rs` gains a new Phase 0 `check_originals` that typechecks
    each Lean-stored inductive/ctor/recursor against `orig_kenv` before
    any aux_gen work runs. Aux-recursor probe `<ind>.rec_1..16` to catch
    recursors that live in their own SCC. Failures are recorded in
    `stt.ungrounded` so the scheduler keeps running.
  - Runs even for non-inductive blocks (`MutConst::Recr`-only SCCs from
    `bad_raw_consts` can carry adversarial recursors that otherwise
    wouldn't ever be checked).
  - Old compile-overlay-polluted syntactic-compare false positives
    (Array vs `_nested.Array_1`) are resolved by the split: Phase 0
    runs against `orig_kenv`, the post-compile FFI check runs against
    the aux-restored canonical env. Neither env sees the in-flight
    compile overlay.

  ### LEON content-hash addressing in `orig_kenv`

  `lean_ingress` now addresses every KId by `ConstantInfo::get_hash()`
  (Blake3 over name + level params + type expression + variant-specific
  fields: ctors, rules, `all`, value, hints, etc.) rather than
  `lean_name_to_addr`. Two properties the name-hash scheme lacked:

  - **Content-distinguishing**: a rogue Lean env can't shadow a primitive
    by naming its own declaration `Nat`.
  - **Compatible with `PrimOrigAddrs`**: the new hardcoded-LEON-hash
    primitive table resolves addresses cleanly against `orig_kenv`,
    avoiding the synthetic `@<hex>` KId fallback that otherwise
    cascaded into `AppTypeMismatch` on every Nat literal reduction,
    Bool literal, String coercion, etc.

  Supporting machinery:
  - `build_leon_addr_map` builds the `Name → Address` map in parallel
    via rayon (returned as `DashMap` for signature compatibility with
    `aux_n2a` downstream).
  - Pass-1 ingress parallelized via `par_iter` (thread-safe: DashMap
    intern, DashMap ingress_cache, unique LEON-hashed KIds mean no
    shard contention on the insert).
  - Block seeding (Phase A/B) uses the constant's **declaration order**
    from `all` rather than FxHashMap iteration order, so
    `discover_block_inductives` produces the order Lean's stored
    recursors were generated against. Fixes spurious
    `check_recursor: type mismatch` on `Lean.Xml.Content.rec`,
    `Lean.Compiler.LCNF.Code.rec`, every
    `Grind.Arith.*.*Cnstr*.rec`, etc.
  - `ingress.rs::lean_ingress` pre-caches `Primitives::from_env_orig`
    via new `KEnv::set_prims` so any `TypeChecker::new(orig_kenv)` sees
    LEON-addressed primitives (the default `prims()` lazily initializes
    to the canonical table, which would miss here).

  ### Native Int reduction

  `try_reduce_int` in `src/ix/kernel/whnf.rs` (wired into `whnf`,
  `whnf_no_delta`, and `lazyDeltaReduction` in `def_eq.rs`, parallel to
  `try_reduce_nat`):

  - Handles `Int.{ofNat,negSucc,add,sub,mul,neg,emod,ediv,bmod,bdiv,natAbs}`.
  - Runs BEFORE delta so bodies like `Int.bmod`'s `Decidable.rec (LT.lt Int …)`
    never expose the stuck `Int.decLt = decNonneg (b - a)` cascade.
  - `IntVal = BigInt`; `extract_int_lit` reads canonical
    `Int.ofNat n` / `Int.negSucc n` forms; `intern_int_lit` round-trips
    back to ctor-headed shape; `int_ediv_emod` normalizes
    truncated-div into Euclidean (non-negative remainder); `bmod`/`bdiv`
    follow Lean's `[-(m/2), (m+1)/2)` window and the `Int.bmod x 0 = x`
    / `Int.bdiv x 0 = 0` corner cases.
  - Lean's C++ kernel lacks a parallel `reduce_int` and reduces
    symbolically through `Int.rec` + native Nat ops, which gets stuck
    when any link of the chain is missing. Our kernel short-circuits.

  Extends `Primitives<M>` with 12 Int fields, threads both `PrimAddrs`
  tables with canonical + LEON hashes for each.

  ### `get_major_inductive_id` resilience

  `src/ix/kernel/whnf.rs`: after peeling the stored
  `params + motives + minors + indices` foralls, if the next domain
  head isn't a `Const` resolving to `KConst::Indc`, scan up to
  `MAX_EXTRA_FORALLS = 8` additional foralls for the first whose head
  IS an inductive. Handles nested-inductive recursor shapes where
  Lean's stored counts don't align with the kernel's view of the
  forall structure after WHNF (e.g., extra instance/motive binders
  not captured by `num_params/num_motives/...`).

  ### `is_rec_field` depth handling — corrected

  The old path mixed (a) head-addr match and (b) a structural
  same-head-Const check on the first `own_params` args — which silently
  returned false whenever a spec_param was a bare `Var` (block param),
  dropping the IH for any recursive field whose nested type used the
  block's params directly (e.g. `head : Entry α β (Node α β)` in a
  nested `List (Entry α β …)` scan).

  Now:
  - Caller passes `spec_params_lift_by: u64` explicitly, because stored
    aux spec_params live at `depth = n_rec_params` and the lift required
    depends on context:
    - `build_minor_at_depth` pushes field locals → lift by
      `self.depth() - n_rec_params`.
    - `build_rule_rhs` uses virtual `Var(total_lams - 1 - j)` positions
      without pushing → lift by `total_lams - n_rec_params`.
  - Comparison uses `is_def_eq` on each (arg, lifted spec_param) pair —
    handles alpha, whnf, beta, Var equality in a single shot.
  - Same fix threaded through `build_minor_type` and `build_rule_rhs`
    call sites; new doc header in the function explains the depth
    contract and the interim-fix history.

  ### Tolerant scheduler: `stt.ungrounded` as DashMap

  `compile_env` no longer aborts on per-block failure. Failures
  (ill-formed inductives, cascading `MissingConstant`, …) are recorded
  in `stt.ungrounded` (type changed from `FxHashMap` to `DashMap` for
  concurrent writes); the rest of the env still compiles; dependents
  hit `MissingConstant` and also land in `ungrounded`. Setup-phase
  timing gated on `IX_QUIET` flag. `block_info` / `reverse_deps`
  initialization is now parallel via rayon's `try_for_each`. Log spam
  for cascading failures is gated on `IX_LOG_BLOCKS`.

  ### FFI: `CheckError` two-variant enum + `rs_kernel_check_consts`

  - `CheckError` (Lean-side) gains a `compileError` ctor. Two variants
    needed both (a) to disambiguate compile-side rejections from
    kernel-side rejections in test output, and (b) to prevent Lean's
    LCNF `hasTrivialStructure?` optimization from eliding a
    single-ctor-single-field inductive to `String` — the heap ctor the
    FFI allocates would otherwise be decoded as a string header and
    SIGSEGV.
  - `KERNEL_EXCEPTION_TAG = 0`, new `COMPILE_ERROR_TAG = 1`. New
    `ErrKind` enum + `CheckRes` type on the Rust side.
  - `run_checks_on_large_stack` / `check_consts_loop` now accept an
    `ungrounded: FxHashMap<String, String>` snapshot; any constant
    present there is reported as `compileError` without invoking the
    kernel (matches the ix_old handling and lets `bad_raw_consts`
    tests — e.g. `inductBadNonSort` failing `compute_is_large_and_k`
    — roundtrip correctly).
  - `build_uniform_error` now emits `ErrKind::Compile` (the setup-phase
    failure happened before the kernel was consulted).
  - `format_tc_error` catch-all uses `{other}` (hand-written `Display`)
    instead of `{other:?}` to avoid dumping raw KExpr internals.
  - `Array Bool` decoding now goes through `unbox_usize() >> 1`,
    matching Lean's `lean_box(n) = (n << 1) | 1` tagged-scalar convention.

  ### Equiv-manager hot-path alloc reduction

  `EquivManager::is_equiv` and `find_root_key` take `&EqKey` instead of
  `EqKey`, eliminating Arc-clones on each call. `add_equiv` stays
  by-value (insert requires ownership). `is_def_eq` builds a single
  `a_key`/`b_key` binding up front and reuses it across is_equiv +
  find_root_key + (at most one) add_equiv. The equiv-root second-chance
  branch is the only remaining clone pair, and it's mutually exclusive
  with the main-path add_equiv.

  ### Infer cache unification

  Drop `infer_only_cache`; keep a single `infer_cache` that only stores
  full-mode results. Infer-only reads happily consume them (validation
  is strictly more than infer-only needs). Removes the
  cache-duplication overhead and the subtle invariant that infer-only
  results weren't supposed to leak into full-mode readers.

  ### `check_recursor_coherence` + coherence gating

  New `check_recursor_coherence(id)`:
  - `check_inductive` on the major (catches strict-positivity, bad ctor
    return shape, field-universe violations — all of which the recursor
    inherits).
  - `compute_k_target(ind_id) == declared k` (K-reduction is sound only
    for a narrow class of inductives; a mismatch is a soundness bug).

  Plus `check_recursor` (the full gen-vs-stored path) now also gates
  with `check_inductive` on the major before comparing rules. Cycle
  invariant documented: `check_inductive` never calls
  `check_recursor[_coherence]`, only `generate_block_recursors`.

  ### Lean-side

  - `Ix/Ixon.lean`: `putConstantMetaIndexed` / `getConstantMetaIndexed`
    always emit/consume three trailing length-prefixed extension tables
    (`meta_sharing` / `meta_refs` / `meta_univs`) as zero-length on the
    Lean side. Matches Rust's always-on wire format; Lean drops any
    payload (it doesn't model call-site surgery data).
  - `Ix/Meta.lean::getCompileEnv`: `loadExts := true` +
    `enableInitializersExecution`, so persistent env extensions (e.g.
    `registerTestCase` state) are hydrated from imported `.olean`s.
    Without this, extensions silently initialize empty — breaks
    extension-state reads via `get_env!`.
  - `Ix/CompileM.lean`: new `rsLeonHashesFFI` opaque for the LEON hash
    dump, consumed by the build-prim-origs test.
  - `src/ix/address.rs`: `Address::to_unique_name` / `from_unique_name`
    synthetic-`Name` codec (`Ix._#.<hex>`). Mirrors Lean-side
    `Ix.Address.toUniqueName`; intended for KId/Named entries at
    synthetic addresses that must not collide with Lean-originated names.
  - FFI:
    - `rs_leon_hashes`: hash every ConstantInfo in place, return
      `Array (Ix.Name × Ix.Address)` — cheap relative to
      `rs_compile_env_to_ixon`; used by test dumps.
    - `Ixon.Named` FFI build/decode now handles the 3-field Lean
      structure (addr, meta, original). Build allocates a 3-slot ctor
      (was 2 — the missing slot caused Lean-side reads of slot 2 to
      walk past the ctor header and SIGSEGV). `original` encodes
      `Option (Address × ConstantMeta)` via the standard boxed
      tagged-union pattern with scalar-optimized `None` handling on
      decode.
    - `rs_eq_env_serialization`: prints section-identifying diagnostics
      under `IX_DEBUG_SERDE` — invaluable for opaque property-test
      counter-examples.

  ### Tests

  - `Tests/Ix/Kernel/BuildPrimitives.lean` (new): dumps the canonical
    `(name, content_addr_hex)` pairs for paste into
    `PrimAddrs::new`. Registered as `rust-kernel-build-primitives`.
  - `Tests/Ix/Kernel/BuildPrimOrigs.lean` (new): dumps LEON-hash pairs
    for `PrimAddrs::new_orig`. Shares `kernelPrimitives` +
    `parseNameToLean` + `collectDeps` with BuildPrimitives as a single
    source of truth. Registered as `rust-kernel-build-prim-origs`.
  - `Tests/Ix/Kernel/CheckEnv.lean` (new): full-env typecheck via
    `rsCheckConstsFFI`. Registered as `kernel-check-env`. Focus-mode
    sister `kernel-check-const` walks a curated `focusConsts` list of
    known-problematic names (Int64/Int32 lemmas stuck in
    AppTypeMismatch; IR-scheme recursors currently rejected at
    compile time; a single suspected WHNF loop commented out).
  - `Tests/Ix/Kernel/Tutorial.lean`: transitive-closure seeding via
    `collectDepsWithExtras` (seeds include both constant names and
    `bad_raw_consts` names; walks refs through `env.constants` with
    fallback to the raw-consts map). Turns a 45s test into a 5s test
    by filtering ~200k unrelated Mathlib blocks. Error reporting
    unpacks `CheckError` by ctor rather than `repr err` — derived
    `Repr` is seconds-slow on multi-line kernel messages.
  - `Tests/FFI/Lifecycle.lean`: `deferIO` + `mkSerdeRoundtripTest` put
    all FFI calls under a lambda so they fire at test-execution time,
    not `TestSeq` construction time. Previous code eagerly called
    `rsSerEnvFFI` during test enumeration.
  - `Tests/Gen/Ixon.lean`: `genNamed` now samples both `none` and
    `some (addr, meta)` for the `original` field (3:1 frequency), so
    the FFI `Named` roundtrip test actually exercises the `original`
    encoding path.
  - `Tests/Main.lean`: register the three new kernel test suites.
  - Regression tests in `src/ix/kernel/inductive.rs` for all three P1
    closures.

  ### Diagnostic env vars (all default off)

  - `IX_TYPE_DIFF`: emit the `[type diff]` / `[rule rhs diff]` walk
    from `check_recursor` mismatch. Default off — every mismatch in an
    alpha-collapse regime produces thousands of lines. Uses
    `KExpr::Display` so the format matches `TcError::AppTypeMismatch`.
  - `IX_APP_DIFF`: print f/a/a_ty/dom and their whnf forms when infer's
    App path rejects.
  - `IX_DEF_EQ_TRACE=<prefix>`: trace every `is_def_eq` where either
    side's head-const display form contains the prefix.
  - `IX_DEBUG_SERDE`: section-level mismatch info for
    `rs_eq_env_serialization`.
  - `IX_QUIET`: suppress `[compile_env]` / `[lean_ingress]` phase
    timings.
  - `IX_LOG_BLOCKS`: gate the verbose dep-status block dump inside
    the scheduler's failure path.
… cross-namespace twins

Ship the "anonymous canonicity" property end-to-end — compile, decompile,
congruence, and surgery now round-trip α-equivalent / cross-namespace /
source-reordered Lean mutuals to byte-equal Ixon. See the new
`docs/ix_canonicity.md` for the authoritative spec.

- `docs/ix_canonicity.md` (new, 1601 lines): theory, block layout,
  metadata sidecar, compile/decompile pipeline, testing plan, and
  open work.
- `Tests/Ix/Compile/Canonicity.lean` (new): 13 cross-namespace twin
  families — simple/nested α-collapse, nested-aux hash ordering,
  parameter rename, Prod binary-arity nesting, self-ref collapse,
  over-merge + partial collapse, HO recursive fields, structures.
- `Tests/Ix/Compile/Mutual.lean`: new `NestedAuxOrdering{,Alpha,Prod}`
  fixtures proving hash-sorted aux order is source-order-independent.
- `Tests/Ix/Compile/ValidateAux.lean`: seeds from the new Canonicity
  namespace in addition to Mutual.

- New `AuxLayout { perm: Vec<usize>, source_ctor_counts: Vec<usize> }`
  in `ix/ixon/env.rs`, where `perm[source_j] = canonical_i` maps Lean
  source-walk aux positions to hash-sorted canonical positions.
- `ConstantMetaInfo::Muts` gains optional `aux_layout`, serialized in
  the Rust Ixon format via `put_aux_layout` / `get_aux_layout`. Not
  carried across the FFI — Lean's `ConstantMeta` has no `muts` variant,
  so it's Rust-internal only.
- `CompileState.aux_perms: DashMap<Name, AuxLayout>` is populated per
  block in `generate_and_compile_aux_recursors`, consumed by
  `compute_call_site_plans` and `compile_aux_block_with_rename`.
- `aux_gen::generate_aux_patches` now returns `AuxPatchesOutput`
  carrying `patches`, `aliases`, the hash-sort `perm`, and class/aux
  counts so callers don't duplicate the canonical-layout logic.

- New module (1841 lines) replaces `aux_gen::canonicalize`. Walks
  Lean-source-order originals and canonical aux_gen output in lockstep
  with a `PermCtx` encoding (a) source→canonical aux positions and
  (b) α-collapse const-name rewrites; FVar correspondence is
  established at outer binder chains and apps are compared via
  `app_spine_alpha_eq_ctx`, which peeks at rec heads to apply the
  arg permutation.
- Fixes three failure modes of the old canonicalize-and-compare
  approach: stale Const references for α-aliased auxes, BVar
  arithmetic in rule-rhs bodies, opaque handling of `.recOn` /
  `.brecOn.go` / `.brecOn.eq` value shapes.
- Wired into validate-aux Phase 2 via `build_perm_ctx_1b` in
  `ffi/lean_env.rs` (also used by `rs_tmp_decode_const_map`).

- `compute_call_site_plans` now takes an `Option<&AuxLayout>` and
  reads `num_motives` / `num_minors` directly from the Lean source
  recursor (Lean's `numMotives = all.size + numNested`). Previously
  deriving these from `original_all.len()` undercounted by
  `numNested`, landing aux motives in the minor slice and producing
  AppTypeMismatches like "Code minor in Array-Alt motive slot" on
  `_sizeOf_N` bodies of nested mutuals (LCNF et al.).
- Source→canonical perm drives reordering of motive/minor args in
  surgered `.rec` / `.below` / `.brecOn` spines.
- New `compile_aux_block_with_rename` maps canonical aux names to
  Lean's source names when registering in the env, so user code
  referencing `X.rec_1` resolves to the canonical aux at the right
  semantic position under non-identity `perm`.

- `generate_recursors_from_expanded` threads
  `source_of_canonical[canonical_i] = source_j` so aux recs emit as
  `<all0>.rec_{source_j + 1}`, matching Lean's `.rec_N`. `below.rs`
  and `brecon.rs` derive `below_N` / `brecOn_N` / `brecOn_N.{go,eq}`
  from the rec's already-source-indexed suffix via
  `aux_rec_suffix_idx` — below/brecon stay in lockstep with rec
  under α-collapse.
- `aux_gen/nested.rs`:
  - Aux `pre_flat` entries carry identity `spec_params` so
    `find_rec_target` correctly identifies nested recursive fields
    (otherwise fields like `List (A α)` missed their nested IH).
  - `type_name_set` mirror of `types` for O(1) membership.
  - `aux_seen` `Vec<(Hash, Name)>` → `FxHashMap` for O(1) dedup.
  - Memoized `replace_all_nested` (per-constructor cache).
  - Members track `source_owner` for downstream discovery.
- Aux ordering now runs `sort_consts` on temporary aux `Indc` values
  instead of a separate name-based tiebreaker.

- New `CompileOptions { check_originals, max_workers }`. FFI
  compile-only callers (`rs_compile_env_full`, `rs_compile_env`,
  `rs_compile_phases`, `rs_compile_env_to_ixon`,
  `rs_tmp_decode_const_map`) opt out of `check_originals` to avoid
  retaining a second kernel-form copy of the full env. Kernel-check
  path (`rs_kernel_check_consts`) enables it only when `expect_pass`
  contains any `false` (the adversarial raw-constant case).
- New `preseed_expr_tables` / `collect_mut_const_exprs` walks block
  exprs up-front to populate `cache.refs` and `cache.univs`
  deterministically before per-constant compile, keeping content
  hashes stable across scheduler orderings.

- Content-hash DAG memoization in `subst` (key `(Addr, depth)`) and
  `inst_univ_inner` (key `Addr`, `us` fixed per call). Mirrors
  lean4lean's `replaceM` / `PtrMap Expr Expr`. Unblocks hot paths
  where the same subtree is walked hundreds of times (`_sizeOf_*`,
  dependent-motive recursors).
- Removed the `2^20` Nat-literal cap on iota. `Nat.rec motive base
  step N` expands ONE level per iota step and only actually recurses
  if `step` forces `ih`; outer `MAX_WHNF_FUEL` correctly bounds
  pathological cases. The old cap rejected legitimate reductions
  inside `Int.Linear.Poly.combine_mul_k'` et al.
- `Int.pow` primitive added to `Primitives<M>` / `PrimAddrs` (anon
  and meta hashes baked in) and to the `kernelPrimitives` dump list.
- `level::norm_add_node` now takes the current succ-accumulator `k`
  — the earlier port silently dropped it, mis-normalizing
  `Succ^n(imax(u, Param v))` for `n > 0`. `normalize_level` is now a
  line-by-line port of lean4lean's `Level.Normalize`; `norm_level_le`
  is a documented soundness-preserving strengthening (lean4lean's
  `NormLevel.subsumption_eval` is `sorry` in `Verify/Level.lean:545`).
- Richer error variants and diagnostics:
  `TcError::AppTypeMismatch { .., depth }`, `UnivParamOutOfRange`,
  `TypeChecker::fuel_used()`.
- Gated diagnostic env vars for perf / reduction debugging:
  `IX_IOTA_STUCK`, `IX_NAT_EXPAND_LOG`, `IX_WHNF_COUNT_LOG`,
  `IX_DEF_EQ_COUNT_LOG`, `IX_INFER_COUNT_LOG`, `IX_SUBST_COUNT_LOG`,
  `IX_DECL_DIFF`, `IX_PHASE_TIMING`.

- `ingress_expr` gains a `CallSite` arm that walks the IXON App
  telescope and distributes per-arg arena indices from
  `CallSiteEntry` across canonical positions. A plain App descent
  propagated the single CallSite arena to every child and lost
  per-arg binder names / failed the head Ref metadata lookup.

- `BlockCache` splits `sharing` (block-level, target of
  `Expr::Share(idx)`) from `meta_sharing` (per-constant, target of
  `CallSiteEntry::Collapsed.sharing_idx`). Treating them as one
  silently returned the wrong subtree on any mutual block combining
  `apply_sharing` output with surgered call-sites — the root cause
  of the "Binder arena vs Expr::Ref Ixon" mismatch on surgered
  `_sizeOf_N`.
- Projection decompile and recursor-rule decompile now call
  `load_meta_extensions` so `Collapsed.sharing_idx` resolves against
  the right table.
- `BuildTelescope` reverses popped args so App spines are rebuilt
  in source order — fixes asymmetric-arg hash instability under
  surgery.

- New `decode_name_array` decodes `Array Lean.Name` structurally
  via a fresh `GlobalCache` (pointer-identity dedup of shared
  subnames). Replaces the fragile `Name.toString` + `parse_name`
  round-trip: Lean's escaped `Lean.Order.«term_⊑_»` no longer fails
  lookup against the kernel's unescaped `Lean.Order.term_⊑_`.
- `rs_kernel_check_consts` signature change:
    List (Name × CI) → Array Lean.Name → Array Bool → Bool →
      IO (Array (Option CheckError))
  with the trailing `Bool` toggling ephemeral per-constant progress
  (in-place `[i/N] name ...` label; only slow/failing/not-found
  constants get persistent log lines). Results are position-paired
  with input names (no `(name, result)` tuple).
- `Tests/Ix/Kernel/Tutorial.lean` and `Tests/Ix/Kernel/CheckEnv.lean`
  refactored to pass `Lean.Name` structurally throughout and use
  `quiet = true` for full-env runs.

- Kernel modules gain `#[cfg(test)]` suites: `tc.rs` (context,
  subst_univ), `def_eq.rs` (proof-irrelevance tier), `congruence.rs`
  (level / expr / const), `infer.rs` (error paths), `error.rs`,
  `primitive.rs`.
- `aux_gen/cases_on.rs` and `aux_gen/rec_on.rs` gain arity /
  binder-name / structural tests.
First green run of `lake test -- kernel-check-env` end-to-end. The
kernel can now independently revalidate every stored mutual block in a
compiled environment without trusting compile-side metadata, the FFI
drives a parallel batch checker with progress / ETA / in-flight
reporting, and `lake exe ix check` ships the same pipeline as a
production CLI command.

The marquee change is §4.4 of the canonicity story (`docs/ix_canonicity.md`):
the kernel now ports `sort_consts` into `src/ix/kernel/canonical_check.rs`
(~1200 lines) and uses it as an independent oracle. The compiler's claim
that a block is canonical is verified, not trusted.

- New `src/ix/kernel/canonical_check.rs`: faithful Anon-mode port of
  `compare_level`, `compare_expr`, `compare_indc`, `compare_ctor`,
  `compare_recr`, `compare_defn`, plus `KMutCtx` block-local class map.
  Two operating modes:
    1. `validate_canonical_block_single_pass` — fast adjacent-pair
       strong-`Less` validation against the stored Muts partition.
       Rejects `Greater` (wrong order) and `Equal` (uncollapsed
       alpha-equivalence). Falls back to full iterative refinement
       when an adjacent pair is only weak `Less` and accepts only when
       refinement returns the same singleton class order.
    2. `sort_kconsts` / `sort_kconsts_with_seed_key` — iterative
       partition refinement (sort → group → re-sort under updated
       `KMutCtx`) for rediscovered nested auxiliaries.
- `TcError::NonCanonicalBlock { block, pos, ordering }`: rejection
  variant emitted by primary-block validation. Display reports the
  failing pair offset and direction.
- `ingress_muts_block` now calls `validate_canonical_block_single_pass`
  on every stored Indc block.
- `canonical_aux_order` in `src/ix/kernel/inductive.rs`: synthesizes
  `KConst::Indc` views of each rediscovered aux (instantiating ext
  type with `spec_params`, replacing aux ctor result heads with the
  synthetic aux KId) and runs `sort_kconsts_with_seed_key` to compute
  the kernel-canonical aux order. The seed key uses a compiler-shaped
  `<all0>._nested.<Ext>_<N>` name so alpha-equivalent aux remain
  distinct synthetic members until partition refinement collapses them
  — matching compile-side `sort_consts` behaviour.
- `RecursorAuxOrder::{Source, Canonical}` on `KEnv`: Lean's original
  recursors use source/queue aux order, so `lean_ingress` builds the
  orig_kenv with `Source` and skips canonical re-sorting; compiled
  Ixon environments use `Canonical`. `build_flat_block` consumers
  branch on this flag.

- Compiled Ixon `Muts(Indc, …)` blocks now contain only user reps —
  positions 0..n-1. Aux inductives are transient compile-time
  entities; the only persistent footprint is the recursor block
  (`<addr>.rec_N`) and downstream aux derivatives (`.below_N`,
  `.brecOn_N`).
- The kernel rediscovers aux from primary ctor walks and recomputes
  the canonical aux order itself via `sort_kconsts`. Stored aux
  recursors validate by position against the kernel-canonical aux
  via `is_def_eq` on the recursor type.
- `docs/ix_canonicity.md` rewritten: §4 is now four operational
  invariants (4.4 = kernel-side canonicity), §6.0 / §6.2 / §10.3
  document the no-aux-in-inductive-block layout and `CallSite`
  metadata alignment, §15 / §17 / §18 update the table-of-properties.
  `sort_aux_by_content_hash` renamed to
  `sort_aux_by_partition_refinement` everywhere.

- `ExprMetaData::CallSite` gains `canon_meta: Vec<u64>` — one arena
  root per canonical-order arg in the IXON App spine. `entries`
  stays in source order for decompile; `canon_meta` is the
  canonical-order metadata sidecar that kernel ingress uses to
  attach binder/reference metadata to each canonical argument.
  Required for split-SCC minor adaptation: source-order minors stored
  as `Collapsed` no longer have a Kept entry from which ingress could
  recover the canonical wrapper's metadata.
- `ingress_expr` CallSite arm walks the IXON spine through
  `IxonExpr::Share` indices when distributing canonical-arg arenas;
  earlier impl crashed with "invalid Share index" on shared subtrees.
- `decompile_expr` mirrors the Share-walk via
  `collect_ixon_telescope_expanding_shares`. The
  Kept-entries-vs-canonical-args invariant relaxes from `==` to
  `<=` because compile-side may synthesize a canonical wrapper arg
  for which there is no source-order Kept entry (split-SCC minor).
- `decompile_inductive` scopes per-constructor `meta_sharing` /
  `meta_refs` / `meta_univs` so aux-generated `.below` constructor
  metadata cannot leak across sibling constructor arenas.

- `KEnv` gains `block_check_results: DashMap<KId, Result<(), TcError>>`,
  `block_checks_in_progress: Mutex<HashSet<KId>>`,
  `block_check_cv: Condvar`, plus `BlockCheckStart::{Cached, Owner}`
  and `BlockCheckToken`. Concurrent checkers cooperate: the first
  caller for a homogeneous block becomes owner; siblings park on the
  condvar until the owner publishes a cached result.
- `check_const` classifies a constant's block as Defn / Inductive /
  Recursor and routes through whole-block coordination when possible.
  Mixed blocks fall back to legacy per-member checking. Ctor / aux
  members of an inductive block also dispatch to the parent block
  check so a single member request validates the whole block.
- `populate_recursor_rules_from_block` replaces `try_late_rule_generation`:
  rule RHSs are populated once from the recursor block's peers using
  full major-premise signature matching (not just inductive address),
  which disambiguates duplicate nested auxiliaries at the right
  recursor positions.
- `block_peer_agreement_cache` collapses O(N²) peer iteration to O(N)
  per block; `infer_only_cache` now isolates infer-only synthesis
  results so unchecked entries never contaminate the validated
  `infer_cache`.

- New BitVec definitional reducer: `BitVec.toNat (BitVec.ofNat w n)`
  → `n % 2^w`, `BitVec.ult` → `Nat.ble (succ x.toNat) y.toNat`,
  `Decidable.decide (x < y)` for BitVec routes through the same path.
- New String primitive reducer: `String.back ""` → `Char.ofNat 65`,
  `String.utf8ByteSize` → byte length, `String.toByteArray ""` →
  `ByteArray.empty`.
- Symbolic Nat reduction: predicate-by-ctor (`Nat.beq` / `Nat.ble`
  on `Nat.succ` chains stay in literal/ctor form), peek through
  symbolic `Nat.add`, lower-bound proofs for `Nat.mod` literal
  numerator vs symbolic denominator, `Nat.sub` / `Nat.add` peeling on
  literal RHS. New `extract_nat_value` recovers numerals from
  `OfNat.ofNat Nat <lit>` and `Nat.succ <lit>` forms exposed by iota.
- Stuck-predicate detection: `Nat.beq` / `Nat.ble` between an unknown
  argument and a recursive Lean model (`Nat.rec`, `Nat.casesOn`,
  `BitVec.toNat`, `Fin.val` projections) stays stuck instead of
  unfolding the model and peeling huge literals.
- Nat-literal iota runaway guard: replaces the old static `2^20`
  literal cap with a per-recursor "consecutive predecessor" detector
  (`nat_iota_run`), bounded by `MAX_CONSECUTIVE_NAT_LITERAL_IOTA`
  (8192) and `MAX_LARGE_NAT_LITERAL_IOTA` (16384). Lean fuel-style
  literals (`hugeFuel := 100_000_000`) still reduce when actual
  recursion is bounded by data structure.
- Projection-definition rewriting: `Subtype.val` and the rest of
  Lean's `Defn { kind = Definition, val = λ … Prj _ _ (Var n) }`
  wrappers are reduced to bare `Prj` so cheap primitive recognizers
  still match the head.
- `Fin.val` over `Decidable.rec`: `(Decidable.rec false_minor true_minor d).val`
  reduces to a `Decidable.rec` whose minors return the projected Nat,
  unlocking cases-on style defs that produce `Fin` values from a
  `Decidable` decision.
- `System.Platform.numBits` reduces directly to `64` — Lean now
  defines it via `Subtype.val (System.Platform.getNumBits ())` whose
  inner `getNumBits` is opaque/extern. `PUnit._sizeOf_1` and
  `Unit._sizeOf_1` reduce to `1` directly, matching the closed-form
  Lean SizeOf instance that's otherwise stuck on an open unit var.
- Multi-arg beta uses `simul_subst` (one substitution pass instead of
  N), and the WHNF main loop adds cycle detection (`seen` history)
  so a reduction that loops without making progress breaks instead
  of fueling out.
- def_eq cheap exits run before `tick`: `ptr_eq`, `hash_eq`, and
  structural `compare_kexpr` all short-circuit before charging
  recursive fuel. Beta/iota/zeta-only app congruence (Tier 1d) runs
  before exposing recursive Lean models; congruence after
  `whnf_no_delta` keeps primitive wrappers stuck when both sides
  share the same head.
- `MAX_REC_FUEL` raised from 200K to 1.5M for BVDecide-style
  generated mutual proofs that genuinely exceed a million kernel
  steps after cache hits stop consuming fuel. `IX_MAX_REC_FUEL`
  env var lets workers override.
- New diagnostic dumps gated by env vars: `IX_DELTA_TRACE`,
  `IX_PROJ_TRACE`, `IX_NAT_TRACE`, `IX_ETA_TRACE`,
  `IX_PROJ_DELTA_TRACE`, `IX_DEF_EQ_MAX_DUMP`,
  `IX_WHNF_FUEL_DUMP`, `IX_RECURSOR_DUMP`. All gated on
  `debug_label_matches_env` so a single label-filtered run can drill
  into one constant without the rest of the env's traffic.

- New `Ix/Cli/CheckCmd.lean` (production CLI command) and
  `Ix/KernelCheck.lean` (shared FFI binding + `CheckError` enum).
  The single `@[extern "rs_kernel_check_consts"]` declaration moves
  out of `Tests/Ix/Kernel/Tutorial.lean` to `Ix.KernelCheck`, with
  Tutorial.lean / CheckEnv.lean re-exporting for backward compat.
- Flags: `--path <file>` (required), `--ns <prefixes>` (transitive
  closure for filtered runs), `--verbose` (per-constant lines vs
  default ephemeral progress).
- Emits a machine-readable `##check##` summary line for CI.

- `rs_kernel_check_consts` ungated from `feature = "test-ffi"` and
  promoted to a production FFI. Test-only roundtrip helpers
  (`rs_kernel_roundtrip{,_no_compile}`) stay cfg-gated.
- `run_checks_parallel_on_large_stacks` spawns N large-stack workers
  with a cooperative scheduler that batches block-coordinated members
  into a single owner task. `ParallelProgress` reports periodic
  done/total, rate, ETA, and oldest in-flight constants.
- Tunables: `IX_KERNEL_CHECK_PROGRESS_MS`, `IX_KERNEL_CHECK_SLOW_MS`,
  `IX_KERNEL_CHECK_ACTIVE_SLOW_MS`, `IX_KERNEL_CHECK_INFLIGHT`,
  `IX_KERNEL_CHECK_NAME_CHARS`. Slow threshold default raised from
  1s to 7s (Mathlib full-env runs swamp the log otherwise).

- `KernelCtx::orig_kenv` is now populated only when
  `CompileOptions::check_originals` is set. Trusted compile paths
  leave it empty so production builds don't retain a second
  kernel-form copy of the entire env.
- `eager_reduce` is a synthetic kernel-only marker because Lean's
  `eagerReduce` shares a canonical content address with `id`;
  address-only dispatch on the real constant would be unsound.
- `KEnv::insert` panics on insertion at any address in
  `PrimAddrs::reserved_marker_addrs()` so user envs cannot smuggle
  a constant into the synthetic-marker slot.
- `Lean.reduceBool` and `Lean.reduceNat` are real primitives now
  (address-dispatched, not synthetic), and the primitive table grows
  to include `System.Platform.getNumBits`, `Subtype.val`,
  `String.toByteArray`, `ByteArray.empty`, `Decidable.rec`, `Fin`.

- `safe definition references unsafe X` rejection extended from
  unsafe defs to unsafe inductives, ctors, and recursors. The
  `is_unsafe` flag participates in `compare_kindc` so alpha-collapse
  cannot merge a safe inductive with an unsafe one.
- `check_inductive_member` skips strict-positivity (A3) for unsafe
  inductives, matching Lean's behaviour.
- `KernelMode::meta_name`: extracts the underlying Lean `Name` from a
  metadata field in Meta mode. Used by canonicity-validation and
  diagnostic paths that need the source name (e.g. for the
  compiler-shaped seed key in `canonical_aux_order`).

- `src/ix/graph.rs` no longer treats `InductiveVal.all` /
  `RecursorVal.all` as outgoing edges. Mutual-block members that
  don't structurally reference each other now split into minimal
  SCCs as they should. New regression tests
  (`inductive_all_members_are_not_graph_edges`,
  `recursor_all_is_metadata_not_graph_edge`) lock the behaviour in.

- New `BRecOnCallSitePlan` and `brec_on_call_site_plans` /
  `below_call_site_plans` on `CompileState`. `.brecOn`'s telescope
  is `params, motives, indices, major, handlers` (vs `.rec`'s
  `params, motives, minors, indices, major`) and motive permutation
  / drop is shared with the recursor plan. `.below` is the
  motive-only `params, motives, indices, major` variant.
- `rec_name_to_brecon_name` / `rec_name_to_below_name` derive
  matching aux names per source/canonical layout.

- `Tests/Ix/Kernel/CheckEnv.lean`: skips constants from
  `Tests.Ix.Kernel.TutorialDefs` (pure-Lean fixtures that intentionally
  don't roundtrip through Ix). Uses parallel quiet mode by default.
- `Tests/Ix/Kernel/Tutorial.lean` / `BuildPrimitives.lean`: import
  `Ix.KernelCheck`, expose new primitives in `kernelPrimitives`.
- New whnf / def_eq unit tests covering each new reducer:
  `whnf_string_legacy_back_empty_literal`,
  `whnf_string_utf8_byte_size_literal`,
  `def_eq_string_to_byte_array_empty`,
  `whnf_bitvec_ult_zero_rhs_is_false`,
  `whnf_bitvec_to_nat_ofnat_zero_is_zero`,
  `whnf_decide_bitvec_lt_zero_is_false`,
  `whnf_nat_ble_symbolic_succ_stays_stuck`,
  `whnf_nat_predicates_reduce_one_symbolic_ctor_layer`,
  `whnf_nat_mod_literal_by_symbolic_lower_bound`,
  `def_eq_nat_add_literal_lhs_not_succ_chain`, plus block-coordination
  tests `checking_one_definition_checks_sibling_block` and
  `concurrent_definition_block_checks_share_result`.
  - Replace two-pass standalone+muts buffered ingress with a single parallel
    try_for_each that inserts directly into KEnv, bounding peak memory by
    in-flight worker outputs instead of materializing every converted constant
    before assembly.
  - Add ixon_ingress_owned that consumes IxonEnv so it can be dropped before
    the heavy kernel check loop runs. rs_kernel_check_consts adopts it and
    also drops the ungrounded map and rust_env_arc early.
  - Workers resolve Named metadata on demand via lookup_name instead of
    cloning every payload during partition.
  - IX_QUIET-gated phase timings (validate, lookups, partition, stream).
  - Drive-by: bulk let-chain brace reformatting across compile, decompile,
    kernel, and aux_gen to match current rustfmt.
Lays the groundwork for measuring the kernel performance plan
(plans/okay-let-s-write-a-lucky-dolphin.md) per audit §10. Counters live
in a new kernel::perf module; KEnv carries a PerfCounters field that is
dumped from a Drop impl when IX_PERF_COUNTERS=1 is set. Unset is the
production default — every increment short-circuits via a LazyLock<bool>
so the cost is a single cached branch on the hot path.

Wired into the cache get sites the audit identified:

- whnf_cache hits/misses (whnf.rs around 201/211)
- whnf_no_delta_cache hits/misses (whnf.rs around 437/446)
- infer_cache and infer_only_cache hits/misses (infer.rs around 45/51)
- def_eq_cache hits/misses (def_eq.rs around 137/149)
- def_eq_failure set hits/inserts (def_eq.rs around 360)
- per-constant peak/avg MAX_REC_FUEL consumption (recorded in
  TypeChecker::reset before the next constant resets it)

Verified against lake test -- kernel-check-env --ignored: 192156/192156
constants pass in 251s (4% over the 242s baseline; overhead is the
LazyLock branch + atomic load in the disabled path).

P1: def_rank_id replaces def_weight_id for hint-priority comparison

Audit Tier 1 #3 (kernel-perf-adversarial-audit-2026-04-26.md, §4.2): the
prior u32 encoding mapped Abbrev to u32::MAX-1 and saturating-added
Regular(h) to h+1, which collide at h ≥ u32::MAX-2. When that happens the
delta-direction logic treats Abbrev and a maximally-heavy Regular as
"same height" and unfolds both, instead of preferring Abbrev as Lean
does (compare(d_t->get_hints(), d_s->get_hints()) at
type_checker.cpp:910).

Replace the u32 weight with a (class: u8, height: u32) tuple compared
lexicographically:
  - Opaque / Theorem / unknown → (0, 0)
  - Regular(h) → (1, h)         (height ordering preserved within class)
  - Abbrev → (2, 0)             (strictly above every Regular)

Update the two call sites (is_def_eq lazy-delta height comparison and
lazy_delta_step). The map_or default for "missing head" is preserved as
(u8::MAX, u32::MAX) — the branch is dead in practice (a_delta && b_delta
imply both heads are present) but kept consistent with the prior u32::MAX
sentinel.

Two regression tests:
  - def_rank_abbrev_above_saturated_regular: Abbrev outranks
    Regular(u32::MAX) (the previous saturation collision).
  - def_rank_regular_orders_by_height: height monotonically orders
    Regular ranks within the class.

Verified with lake test -- kernel-check-env --ignored: 192156/192156 in
256s (no regression vs the 242s pre-perf-counters baseline; the +14s is
from the IX_PERF_COUNTERS=unset LazyLock branch added in the prior
commit, not this change).

P2: peel_proj_forall fast-paths syntactic Pi in projection inference

Audit Tier 1 #2 (kernel-perf-adversarial-audit-2026-04-26.md, §7.2):
infer_proj's two parameter-consuming loops (param peel and field peel)
called self.whnf(&r)? unconditionally per iteration, on a body mutated
by subst at the previous step. The whnf cache rarely hits between
iterations and each call re-traverses the substituted body.

Extract a peel_proj_forall(&r, err) helper that:

  - tries ExprData::All(..) syntactically first (no WHNF call), and
  - falls back to full self.whnf(e) only when the binder isn't already
    syntactic Pi.

This mirrors Lean's inferProj at type_checker.cpp:582–610. Both
projection-inference loops now call peel_proj_forall instead of
unconditional whnf.

Behaviorally equivalent — same WHNF semantics on miss, no semantic
change otherwise. Verified with lake test -- kernel-check-env --ignored:
192156/192156 in 258s (parity with the post-pre-work, post-P1
baseline; no measurable regression and the cache-hit-rate counters will
move on tactic-heavy workloads under IX_PERF_COUNTERS=1).

P3a: WhnfFlags substrate (no behavior change)

Lays the foundation for the Lean4Lean architectural alignment described in
plans/okay-let-s-write-a-lucky-dolphin.md. Phase 3a is substrate-only —
no call site is migrated to cheap mode yet, so behavior is unchanged.

Adds:

- WhnfFlags { cheap_rec, cheap_proj } with FULL and CHEAP consts and
  is_full(). CHEAP is currently equal to FULL until Phase 3c wires it.
- whnf_core_with_flags (private): the existing whnf_core impl, now
  threading flags into recursive calls and try_iota_with_flags.
- whnf_core / whnf_core_cheap (super): FULL/CHEAP wrappers.
- whnf_no_delta_with_flags (private): the existing whnf_no_delta impl
  with the Prj branch gated on cheap_proj — falls back to full whnf
  on the projected value when not cheap.
- whnf_no_delta (pub) / whnf_no_delta_cheap (super): wrappers.
- try_iota_with_flags: gates major-premise WHNF and string-literal
  constructor reduction on cheap_rec.
- try_proj_app_reduce_with_flags: gates projected-value WHNF on
  cheap_proj.

Cache reads/writes (whnf_no_delta_cache, equiv-manager second-chance)
are gated on flags.is_full(): cheap callers neither read nor write the
cache, preserving the invariant that any cached entry is a fully-reduced
normal form.

Phase 3b will inline the projection branch into whnf_core to match
Lean4Lean's two-layer architecture (refs/lean4lean/Lean4Lean/
TypeChecker.lean:266, 297). Phase 3c will flip CHEAP to enable cheap_proj
and migrate specific def-eq sites.

Verified with lake test -- kernel-check-env --ignored: 192156/192156 in
243s (matches the 242s baseline; substrate adds no measurable overhead
when CHEAP == FULL).

P3b: inline projection into whnf_core (Lean4Lean architectural alignment)

Move the Prj branch from whnf_no_delta_with_flags into whnf_core_with_flags
so our whnf_core matches Lean4Lean's whnfCore semantics exactly
(refs/lean4lean/Lean4Lean/TypeChecker.lean:284-292, 337-341).

Before this commit:

  whnf_core      — beta + zeta + iota + cheap projection (recursive
                   whnf_core on val, no delta)
  whnf_no_delta  — whnf_core + FULL projection (full whnf on val) +
                   native primitives + projection_definition + quotient
  whnf           — whnf_no_delta + delta

Lean4Lean's architecture has no whnf_no_delta layer. Their whnfCore
includes projection, with the cheap_proj flag deciding whether the
projected value uses whnfCore (cheap) or whnf (full). After this commit:

  whnf_core (with WhnfFlags) — beta + zeta + iota + projection
                               (cheap_proj controls val reduction)
  whnf_no_delta              — whnf_core(_, FULL) + native primitives
                               + projection_definition + quotient
  whnf                       — whnf_no_delta + delta

The bare-Prj branch in whnf_no_delta_with_flags is removed —
whnf_core now handles it directly. The App-of-Prj branch stays in
whnf_no_delta because whnf_core's loop returns once the outermost Prj
is resolved; try_proj_app_reduce_with_flags gives one more attempt at
the same cheap_proj policy when the outer expression is App(Prj, ...).

Pure refactor, no semantic change with CHEAP == FULL. Verified with
lake test -- kernel-check-env --ignored: 192156/192156 in 270s
(within noise of the 243s pre-refactor baseline). Phase 3c will flip
CHEAP to enable cheap_proj=true and migrate def-eq's lazy-delta sites
surgically per Lean4Lean's pattern.

P3c (postponed): document the HeaderParsedSnapshot regression

P3a (substrate) and P3b (Lean4Lean architectural alignment) are committed.
Phase 3c — flipping CHEAP to enable cheap_proj=true and migrating the def-eq
lazy-delta sites — was attempted but reproduced 5 failures on chained
projections in Lean.Language.Lean.HeaderParsedSnapshot.* even after P3b
inlined the projection branch into whnf_core. The substrate is left in
place; CHEAP stays equal to FULL until the regression's root cause is
understood.

Notes on the regression for the next investigator:

  - Failures: HeaderParsedSnapshot.{stx,result?,metaSnap,toSnapshot,ictx},
    all with 'projection: type mismatch with declared struct'.
  - The struct `extends` a parent, so each projection is a chained Prj
    whose val is itself a Prj into the parent.
  - The error comes from infer.rs's infer_proj at the head-vs-struct_id
    address compare, after FULL whnf on val_ty. That whnf is FULL, but
    val_ty was inferred via paths that may have consulted a def-eq cache
    populated under cheap mode. Possible cache-poisoning suspect:
    def_eq_cache writes a `false` result for inputs whose lazy-delta
    loop bottomed out under cheap projections. The cache key uses raw
    a/b hashes, not cheap-reduced shapes, so a stored `false` is
    indistinguishable from a FULL `false` by future readers.
  - Lean4Lean does not have an analogous wide def_eq_cache; their failure
    cache is keyed only on same-spine pairs in lazyDeltaReductionStep.

Future P3c iterations should either prove the cache poisoning theory
incorrect or restrict def_eq_cache writes to FULL-derived results.
  Port the cheapProj path for def-eq WHNF without letting cheap reduction
  results pollute full-mode caches.

  Def-eq now uses a private `WhnfFlags::DEF_EQ_CORE` path for structural and
  no-delta WHNF, matching Lean/lean4lean's cheap projection behavior while
  leaving public WHNF semantics full. Cheap def-eq calls are tracked with a
  depth counter so cheap `false` results are cached only in a cheap-only cache;
  cheap `true` remains safe to promote to the full def-eq cache.

  WHNF caches no longer lookup or mirror through def-eq equivalence roots. That
  relation is too broad for WHNF cache sharing because proof irrelevance, eta,
  and structure eta establish def-eq without guaranteeing identical WHNF.

  Add focused regression coverage for `HeaderParsedSnapshot` extended-structure
  projections.
  Add detailed phase timing for constant and block checks, including validation
  and safety sub-phases, gated by IX_PHASE_TIMING and the debug constant filter.

  Memoize expression, universe, and constant visits during well-scoped validation
  and unsafe-reference checks so shared expression DAGs are walked once. Also skip
  duplicate block prevalidation for definition blocks, since each member is
  validated by the normal definition checker path.

  Use a context-suffix key for inference caching so open terms can share cached
  types across contexts when their reachable local suffix is identical.
  Three independent improvements landed together in the staged tree:

  1. New `lake exe ix ingress --path <file>` CLI for Lean → Ixon → KEnv
     ingress-only performance analysis (skips the typecheck loop). Lean
     side: `Ix/Cli/IngressCmd.lean`, registered in `Main.lean`. Rust side:
     `rs_kernel_ingress` FFI in `src/ffi/kernel.rs` (mirrors
     `rs_kernel_check_consts`'s read-env / compile / ingress timing
     breakdown; opt out of teardown via `IX_SKIP_DROPS=1`).

  2. Kernel caching:
     - Cache FULL-mode `whnf_core` results in `KEnv::whnf_core_cache`,
       mirroring lean4lean's `whnfCoreCache` and lean4 C++'s `m_whnf_core`.
       Cheap-mode bypasses the cache (cheap-projection results aren't safe
       to share with full callers).
     - Suffix-aware `whnf_key`: switch from "closed → empty / open → ctx_id"
       to `ctx_addr_for_lbr(e.lbr())`, matching `infer_key`. Open-term WHNF
       now hits cache across outer contexts that share the relevant suffix.
       Soundness argument inline; new test exercises the cross-outer hit.
     - New `unfold_cache` wrapping `instantiate_univ_params` for delta
       unfolding, keyed by the `Const(id, us)` head's content hash (mirrors
       lean4 C++ `m_unfold`). Eliminates O(body) walks on repeated unfolds.
     - `IX_PERF_COUNTERS` reports the new whnf-core and unfold hit rates.

  3. Faster post-ingress teardown:
     - Switch global allocator to `mimalloc` (`Cargo.toml`, `src/lib.rs`).
       glibc's per-arena lock dominates concurrent `free` past ~16 threads;
       mimalloc's thread-local free lists scale linearly.
     - Parallel-shard drop for `IxonEnv` and `KEnv`. Replace single-threaded
       `drop(map)` with `map.into_par_iter().for_each(drop)` so rayon
       work-steals across DashMap's ~128 shards. Each map is parallelised
       internally and dropped sequentially with respect to its siblings to
       keep per-map timing clean. Opt out via `IX_SEQ_IXON_DROP=1` /
       `IX_SEQ_KENV_DROP=1`.
     - End-to-end on `lake exe ix ingress Benchmarks/Compile/CompileMathlib.lean`
       (32 cores): drop_ixon_env 438.82s → 4.98s (88×), drop_lookups 15.58s
       → 0.60s (26×), kenv destructors 210.4s → 14.5s (14.5×); total
       ingress wall-clock 462s → 237s (1.95×).
End-to-end on CompileMathlib (707k consts, 32 workers) the parallel
ingress convert step drops from ~4452s worker-sum / 140s wall to ~747s
worker-sum / 24s wall — a 42% wall-time reduction on stream ingress and
17% on total ingress (incl. drops).

The original Mathlib profile pointed at `convert` (99% of worker-sum
inside ingress) but didn't break it down. Adding fine-grained
ConvertStats timing surfaced two real hotspots: (a) ~45% of convert was
KExpr construction (blake3 hash + intern_addr + Arc<ExprData> alloc),
of which ~62% was wasted because the intern table already had the same
canonical value; (b) ~38% was DashMap-shard contention in InternTable
under 32-way concurrency. Fixing both — without changing any
content-hash semantics or public APIs — is what this commit does.

  - ConvertStats gains per-stage *_ns/_calls fields gated on
    IX_INGRESS_CONVERT_STATS=1: resolve_kvmap_ns, arena_walk_ns,
    intern_expr_ns/_calls/_get_hits, intern_univ_*, expr_cache lookup/
    insert ns, get_blob_*, kexpr_construct_*, plus whole-arm
    process_arm_ns and continuation_arms_ns wrappers (~97% coverage of
    convert). All wraps short-circuit on `if !stats.enabled` so prod
    runs pay nothing. Adds a top-level "convert timing (worker-sum)"
    print line in ixon_ingress_inner for quick diagnosis.

  - Hash-first interning. Each KExpr::*_mdata constructor splits into
    KExpr::*_hash (compute the blake3 content hash from inputs without
    allocating) + KExpr::*_mdata_with_addr (build given a precomputed
    canonical Addr, skipping intern_addr) + KExpr::*_mdata (now a thin
    wrapper that does hash + intern_addr + with_addr). New
    timed_intern_or_build helper threads the dance: precompute hash,
    try_get_expr; on hit return the existing canonical Arc, on miss
    intern_addr + build + intern_expr. All 13 hot-path constructor call
    sites in ingress_expr / ingress_univ go through it. Cuts
    kexpr_construct_calls 313M → 127M (-59%, matches the 60% intern
    hit rate) — that's ~186M Arc<ExprData> allocations + matching drops
    we no longer perform.

  - InternTable<M>::univs/exprs and ADDR_INTERN now use
    DashMap::with_shard_amount(2048) instead of the default
    4*num_cpus()=128 on a 32-thread box. At 32 workers the per-op
    shard-collision rate drops from ~25% to ~1.5% with negligible
    memory overhead (~32 KB extra for shard headers). Tested 1024,
    2048, 4096 in clean runs: 2048 is the sweet spot; 4096 measurably
    regresses from cache effects on the larger shard array.
    InternTable::intern_univ/intern_expr and intern_addr also get a
    get-first-then-entry pattern so steady-state hits avoid the
    write-side lock entirely; new try_get_expr/try_get_univ helpers
    expose the read-fast-path for callers that want hit/miss stats.
  The kernel-side `canonical_aux_order` (used by
  `populate_recursor_rules_from_block`) and the compile-side
  `sort_aux_by_partition_refinement` were producing divergent canonical
  orders for nested-inductive auxiliaries, surfacing as recursor
  alignment failures (e.g. `Lean.Json.rec_1`: kernel placed TreeMap.Raw
  at canonical position 0, compile placed Array there).

  Three independent divergences were folded together:

  * **Universe-level normalization.** `Level::max` was a raw constructor
    while the kernel's `KUniv::max` is a smart constructor. Substituting
    `u, v := 0, 0` into `Max(Succ Param u, Succ Param v)` left compile
    with `Sort (max 1 1)` while the kernel saw `Sort 1`. `compare_level`
    treats `Succ` and `Max` as different variants (strong inequality),
    so structurally-identical auxes were split into different classes
    in the very first refinement pass. Fix: add `Level::max_smart` /
    `Level::imax_smart` mirroring `KUniv::max` (zero absorption,
    `max(a,a)=a`, same-base offset, Max absorption) and route
    `subst_level` through them. The original raw constructors are kept
    for the few callers that depend on un-normalized form.

  * **Recursor block storage order.** `compile_aux_block_with_rename`
    now accepts an optional `class_order_key` and the aux-recursor
    emit path threads in a `name_to_pos` map so the recursor block is
    laid out in the same canonical order as the inductive block. This
    lets `populate_recursor_rules_from_block` align `rec_ids[i]` with
    `flat[i]` positionally instead of searching by major-domain
    signature. The kernel side now wraps synthetic aux types with
    block-param Pis (`extract_block_param_binders` /
    `wrap_with_block_param_foralls`) to match compile's
    `mk_forall(body, &block_param_decls)` exactly.

  * **Within-class seed-key tiebreak.** Both `sort_consts` and
    `sort_kconsts_with_seed_key` were re-sorting each class by name /
    seed key after every refinement iteration. That promotes a
    "tentatively equal" relationship into a name-derived structural
    fact, producing different canonical orders for identical content
    depending on Meta/Anon mode and discovery numbering. Removed in
    both sides; rationale comments cross-reference each other.

  Also adds:

  * `IX_RECURSOR_DUMP=<prefix>` debug output in both
    `canonical_aux_order` and the compile-side aux-sort path, plus
    per-iteration class dumps in `sort_consts`. Triggers a structured
    per-peer breakdown when alignment fails
    (`dump_recursor_alignment_failure`).

  * `ix check --consts <names>`, `--consts-file <path>`, and
    `--fail-out <path>` for bisecting individual failing constants out
    of a full-env run without re-paying the compile + ingress cost.

  * `Lean.Json.rec_1` added to the focus-consts regression set.
  `populate_recursor_rules_from_block` was failing with `rec_ids/flat
  count mismatch: rec_ids=13 flat=20` on Aesop's three-way mutual block
  (`GoalUnsafe`/`MVarClusterUnsafe`/`RappUnsafe`). The compile side
  generated 10 nested auxiliaries; the kernel's reconstruction in
  `build_flat_block` discovered 17.

  Two over-discovery bugs in `try_detect_nested`:

  * **WHNF unfolded definitions.** The forall-peeling loop called
    `self.whnf(&cur)`, which unfolded def heads. `IO.Ref α` became
    `ST.Ref IO.RealWorld α`, and the kernel then synthesised
    `_nested.ST_Ref_*` and `_nested.Nonempty_*` auxiliaries that the
    compile side never generates. Lean's C++ kernel
    (`inductive.cpp:is_nested_inductive_app`) and our compile-side
    `replace_if_nested` both check the head literally. Replaced the
    `whnf` loop with a structural `while let ExprData::All(..)` peel.

  * **`flat`-address membership over-matched.** The "param mentions a
    block member" check used `combined_addrs = block_addrs ∪
    flat_addrs`. For an aux `flat[i].id.addr` is the EXTERNAL
    inductive's address (e.g. `Array@567893fa`), so an unrelated
    occurrence like `Option (Array Aesop.Script.LazyStep)` matched on
    `Array`'s addr — `Array_4` shares it. Lean / compile-side check
    internal aux names (e.g. `_nested.Array_4`), which can never collide
    with a regular Const reference. Restricted the check to
    `block_addrs` only.

  After the fix the Aesop block reconstructs to flat=13 (3 originals +
  10 auxes), matching the stored recursor block. Full `kernel-check-env`
  still passes 192156/192156.
This is an architectural rework of the kernel runtime to bound memory
on full-mathlib type-check runs. The previous design held a single
process-shared KEnv behind Arc<DashMap>, plus a global Arc<blake3::Hash>
intern table; on mathlib-scale environments those data structures grew
past 8+ GiB even though the per-block working set was modest. This
commit moves parallelism above the kernel state boundary: each worker
owns its KEnv and clears it between scheduled blocks.

* `Addr` is now `blake3::Hash` (Copy, 32 bytes), no longer
  `Arc<blake3::Hash>` behind a process-global `ADDR_INTERN`. Identity
  comparison falls back from `Arc::ptr_eq` to a 32-byte memcmp (one
  AVX2 cycle), but eliminates the global state and one allocation per
  KExpr/KUniv construction.
* `KEnv` swaps every `DashMap`/`DashSet` for `FxHashMap`/`FxHashSet`,
  `OnceLock` for `OnceCell`, and `set_prims`/`insert`/etc. take `&mut
  self`. The `block_checks_in_progress` Mutex+Condvar coordinator and
  `BlockCheckStart`/`BlockCheckToken` are gone — block-check results
  are a plain map lookup.
* New `KEnv::clear()` and `KEnv::clear_releasing_memory()` so workers
  can recycle a single env across many scheduled blocks. The latter
  drops backing allocations to keep one heavy block from permanently
  ratcheting a worker's retained capacity.
* Removed the old parallel `Drop` (rayon-driven shard teardown) — no
  longer needed without DashMap, and serial drop is fine for empty
  worker envs at end of run.
* `InternTable` gains reusable `subst_scratch`/`lift_scratch` buffers
  so `subst`/`simul_subst`/`lift` no longer allocate a fresh
  `FxHashMap` per call (a hot path on heavy beta/zeta blocks).
* New `KEnvCacheSizes` snapshot for diagnostics — surfaces per-worker
  cache growth under `IX_KERNEL_CHECK_DIAG=1`.

`TypeChecker<'a, M>` now holds a mutable borrow of one worker's env
instead of `Arc<KEnv>`. All the kernel passes (`check.rs`,
`def_eq.rs`, `inductive.rs`, `infer.rs`, `whnf.rs`) propagate `&mut
self` through what were previously `&self` paths reading from
DashMap. The mutating helpers `try_get_const`/`get_const`/
`try_get_block`/`has_const` replace direct `env.get(&id)` calls so the
lazy-ingress hook can fault constants in on miss.

New `LazyIxonIngress` (`tc.rs`) plus
`TypeChecker::new_with_lazy_ixon`. Workers start with an empty KEnv
and an `Arc<IxonEnv>` + `Arc<IxonIngressLookups>`; whenever the
checker hits an unknown address, `lazy_ingress_addr` calls
`ingress_addr_shallow_into_kenv_with_lookups` (new in `ingress.rs`) to
materialise that constant (or its containing Muts block / projection
block) on demand. `Primitives::from_addr_names` (new in
`primitive.rs`) seeds the prim table from the lookups before any
constant is faulted in, so prims resolve to real Lean names rather
than synthetic `@<hex>` fallbacks.

`IxonIngressLookups` (new public struct) carries the address↔name
maps, namespace `names_by_addr`, and `muts_by_addr` for projection
aliasing. `build_ixon_ingress_lookups` is the one-shot builder; the
top-level `ixon_ingress_owned` no longer parallelises across blocks
(rayon is gone from `lean_ingress` and `ixon_ingress` — both serial
now since `KEnv` is single-threaded).

`rs_kernel_check_consts` / new `rs_kernel_check_ixon` build
`CheckWorkItem`s that collapse alias requests onto the same scheduled
block (projection-aware via `check_schedule_block_addr`). The pool
spawns N worker threads, each with its own `KEnv` and a 256 MiB stack;
work is drained from a shared `AtomicUsize` counter; results are
fan-in via `OnceLock<CheckRes>` slots. `IX_KERNEL_CHECK_CLEAR_EVERY=N`
controls how often a worker calls `clear_releasing_memory` (default 1
— per block).

* New per-block diagnostic mode (`IX_KERNEL_CHECK_DIAG=1`,
  `IX_KERNEL_CHECK_DIAG_THRESHOLD=N`) emits
  `[diag-peak]/[diag-big]` lines when a single block's caches push
  past a threshold or set a new per-worker peak.
* New RSS telemetry in `ParallelProgress`
  (`IX_KERNEL_CHECK_MEM_STATS=0` to disable): rolling `peak_rss_mib`
  from `/proc/self/status`, summary line printed at end of run.
* `format_tc_error` resolves `UnknownConst(addr)` against the lookups
  to print the Lean name + 12-char hex, instead of just hex.

* `Ix/Cli/CompileCmd.lean` gains `--out <path>` to persist the
  serialized `Ixon.Env` (`Env::put` bytes) to disk after compile.
* New `Ix/Cli/CheckIxonCmd.lean` registers `ix check-ixon`, which
  loads a serialized env from `--env <path>` and runs the kernel
  worker pool over it. Filters: `--ns` (comma-separated prefixes),
  `--consts` (exact names), `--consts-file` (one name per line, `#`
  comments), plus `--fail-out` for a structured failure dump and
  `--verbose` to log every constant. Refuses to run on an empty
  filter selection.
* New FFI: `rs_kernel_check_ixon` reads `Ixon.Env::get` bytes from
  disk, builds `IxonIngressLookups`, then drives the same
  `run_checks_on_large_stack` path as `rs_kernel_check_consts`.
  `rs_kernel_ixon_names` enumerates checkable named constants from a
  serialized env (used by `--ns` filtering without rebuilding Lean).

`compile_env_with_options` no longer maintains a separate `orig_kenv`
populated up-front via `lean_ingress` — this snapshot was only used by
the Phase-0 `check_originals` pass and roughly doubled retained
expression memory. `CompileOptions::check_originals` and
`CompileState::kctx`/`check_originals` are deleted; each worker
constructs its own `KernelCtx` (`kctx.kenv: KEnv<Meta>`, no Arc) and
threads it as `&mut` through the aux_gen pipeline. The aux_gen
prereq, mutual-block, decompile, and validate-aux paths all flow
local kctxs explicitly.

* `MaxRecFuel` is a distinct `TcError` variant from `MaxRecDepth`;
  fuel-exhaustion now reports the right cause. `MAX_REC_FUEL` default
  raised from 1.5M → 10M (mathlib category/algebra proofs need it),
  `IX_MAX_REC_FUEL` override unchanged.
* New `is_prop_cache`: memoizes `is_prop_type` (the answer to "does
  this type have sort `Prop`?") so repeat `try_proof_irrel` probes
  skip the `infer ∘ infer ∘ whnf` chain. Empirically the dominant
  cost on mathlib proof-heavy blocks. Perf counters
  (`is_prop_cache_hits/misses`) added to `PerfCounters`.
* `ctx_addr_for_lbr` memoized on `(ctx_id, lbr)` per-TypeChecker —
  hot path called from every `whnf_key`/`infer_key`/`def_eq_ctx_key`.
* `def_eq_ctx_key(a, b)` replaces the old `eq_ctx = if closed then
  empty else self.ctx_id` heuristic — context key is now driven by
  `a.lbr().max(b.lbr())`, matching the WHNF/infer cache shape.
* `whnf_core` short-circuits leaf node kinds (Sort/All/Lam/Nat/Str/
  Const, plus closed Var) before the cache probe so the
  `whnf_core_cache` doesn't fill with trivial `e → e` entries.
* `collect_app_spine` pre-counts arity to allocate the args Vec
  exactly once.

* Tutorial tests (`basic.rs`, `defeq.rs`, `inductive.rs`,
  `reduction.rs`) and `testing.rs` updated for the `&mut KEnv`
  constructor signature.
* `egress.rs`, `equiv.rs` updated for `Addr = Hash` (no Arc).
* Many `intern_addr(...)` removals across `expr.rs`, `level.rs`,
  `ingress.rs`, `tc.rs`, `subst.rs` — addresses are now plain hashes
  passed by value or built directly from `blake3::Hasher::finalize()`.
…r rework

  Three large independent threads landed together. They share files (notably
  whnf.rs, infer.rs, inductive.rs) so splitting the diff post-hoc is awkward;
  each is described separately below.

  == 1. Free-variable binder opening ==

  New `src/ix/kernel/lctx.rs` introduces `LocalContext` (insertion-ordered,
  FxHashMap-indexed by `FVarId`) and `NameGenerator`. New `ExprData::FVar(id,
  name, info)` leaf carries an opaque `FVarId(u64)` whose value participates in
  the blake3 content hash, so distinct fvars hash distinctly. The fresh-id
  counter lives on `KEnv` (`next_fvar_id`) so per-worker checkers cannot mint
  colliding ids and pollute shared infer/whnf caches.

  `infer.rs` opens `Lam` / `All` / `Let` binders into the lctx via
  `open_binder` / `open_binder_anon[_with_fv]` helpers on `TypeChecker`. The
  inferred body is closed back via `abstract_fvars` and wrapped in `All` with
  anonymous name + default binder info, matching the legacy de-Bruijn shape
  (recursor coherence relies on this exact shape). `def_eq.rs` swaps the
  `lbr()==0` Bool-eager-reduction guard for `!has_fvars()`, since lctx-opened
  expressions are no longer closed under `lbr`.

  A new `ExprInfo::has_fvars` flag is ORed through composite constructors, so
  "does this subtree mention any fvar" is O(1). Substitution gains
  `instantiate_rev`, `abstract_fvars` (both memoized via `FxHashMap` per
  call), and `cheap_beta_reduce` for App(λ, …) peephole reduction inside
  inferred Pi types. `subst_no_intern` / `lift_no_intern` produce non-interned
  results for short-lived WHNF intermediates that would otherwise pin
  gigabytes of dead predecessor literals in the global intern table.

  `inductive.rs` migrates its validation walks from `push_local`/`pop_local`
  (de-Bruijn `ctx` stack) to `lctx.push` / `lctx.truncate`. Constructor field
  traversal now records the fvar `KExpr` and uses `hash_eq` against it
  instead of computing de-Bruijn offsets — the dead `lower_vars` /
  `lower_vars_inner` helpers and the old `build_motive_type` are removed.
  `tc.rs::depth()` is now the sum of the legacy ctx and the lctx because some
  validation paths still mix both during the partial transition.

  Soundness fences:
    - `egress.rs` panics on `FVar(_)` in both `egress_expr` and
      `kexpr_to_ixon` — egressed terms must be abstracted back to de Bruijn.
    - `canonical_check.rs::compare_kexpr` returns
      `Err(TcError::UnexpectedFVarInComparator)` (new variant) when either
      side has fvars, so canonicalization never sees an open expression.
    - `aux_gen/expr_utils.rs::kexpr_to_lean` maps an unexpected FVar to a
      synthetic `_kernel_fvar_<id>` name so diagnostics can surface the leak
      instead of silently emitting nonsense.

  `is_let_var` replaces the old `num_let_bindings > 0` quick exit in WHNF, so
  the per-Var lookup decides whether to consult `let_vals`.

  == 2. Alpha-invariant content hashing ==

  `expr.rs::var_hash` / `sort_hash` and `level.rs::param`'s hasher no longer
  fold the binder/param `Name` into the blake3 hash. Same change is reflected
  in the binder-aware variants (Lam/All/Let) — names and `BinderInfo` are
  display-only metadata. Result: in both `Anon` and `Meta` modes, two
  expressions that differ only in binder display names hash identically. The
  `def_eq.rs` Tier-1 hash-equality fast-path is now sufficient by itself; the
  older "fall through to `compare_kexpr` for alpha-equivalence" branch is
  removed (commented in place explaining why).

  `mode.rs` retains `MetaHash` for callers that explicitly want metadata
  ordering for diagnostics, but the trait is no longer wired into the
  expression/universe hash builders.

  `level.rs` tests are flipped to assert that anon-vs-meta param hashes
  match, and that meta params with different display names produce equal
  addresses.

  == 3. Nat reducer rework ==

  A from-scratch rebuild of WHNF's Nat handling, audited against
  `refs/lean4/src/kernel/type_checker.cpp` and
  `refs/lean4lean/Lean4Lean/TypeChecker.lean`. The audit and per-primitive
  expected behaviour are documented in `docs/nat-reduction-audit.md`.

  Major shape changes in `whnf.rs`:
    - `whnf_with_nat_succ_mode(NatSuccMode::Collapse | Stuck)` separates the
      full-WHNF caller from inner Nat-recursor literal peeling, where caching
      each predecessor would balloon RSS.
    - `cleanup_nat_offset_major` exposes one constructor layer of an
      offset major (`base + k`) as `Nat.succ (base + (k-1))`, so
      `Nat.rec ... (x + huge)` doesn't allocate one literal per
      predecessor while still letting closed arithmetic compute compactly.
    - `try_reduce_nat_succ_iter` / `try_reduce_nat_succ_linear_rec` give
      bounded linear shortcuts for `Nat.rec` step cases that immediately
      force `ih` (the runaway pattern that the previous
      `nat_iota_should_stick` heuristic was trying — and failing — to catch
      on omega-style proofs).
    - The Tier-4 lazy-delta reduction order in `def_eq.rs` is realigned
      to lean4's reference: `is_def_eq_offset` → `try_reduce_nat` (gated on
      `!has_fvars` || `eager_reduce`) → `try_reduce_native` → ix-specific
      `try_reduce_decidable`. The old special-cased `try_reduce_int` is
      deleted; Int operations now reduce by ordinary delta/iota plus native
      Nat reduction, matching Lean.

  Address-based primitive recognition in `primitive.rs`. Names previously
  matched via `is_const_named` (string compare on `KId::name`) are now
  resolved by hardcoded blake3 address — alpha-twin display names (e.g.
  `Lean.RBColor.rec` for `Bool.rec`) silently bypassed the old check under
  canonical hashing. New `Primitives` fields cover `Nat.rec`, `Nat.casesOn`,
  `BitVec`/`BitVec.toNat`/`BitVec.ofNat`/`BitVec.ult`, `Decidable.decide`,
  `LT.lt`, `OfNat.ofNat`, `Unit`, `PUnit._sizeOf_1`, `SizeOf.sizeOf`,
  `String.back` / `String.Legacy.back` / `String.utf8ByteSize`, plus
  `Int.decEq` / `Int.decLe` / `Int.decLt`. The deleted helpers
  (`find_const_id_named`, `dotted_name`, `synthetic_named_id`,
  `name_components_eq_dotted`, `is_const_named`) were the last name-based
  pattern in WHNF.

  `Tests/Ix/Kernel/BuildPrimitives.lean` extends the dump set so
  `PrimAddrs::EXPECTED` stays in sync.

  == Other kernel wiring ==

    - `KEnv` gains `whnf_no_delta_cheap_cache` and `whnf_core_cheap_cache`,
      keyed identically to the full caches but read/written only by cheap
      callers. Without them the def-eq lazy-delta loop redoes
      `whnf_no_delta_for_def_eq` from scratch every iteration on the same
      operand (mathlib hot path was O(N²)).
    - `whnf` cycle detection switches from `Vec` linear scan to
      `FxHashSet<Addr>`. mathlib delta chains hit hundreds of distinct
      intermediates; addr equality is one 32-byte blake3 compare.
    - New diagnostic env vars: `IX_HOT_MISSES` / `IX_HOT_MISS_CTX`,
      `IX_NAT_IOTA_TRACE`, `IX_NAT_LINEAR_REC_TRACE`, `IX_FVAR_TRACE`,
      `IX_APP_DIFF_DEPTH`, plus a `def_eq_trace_depth` field that lets
      inner tier-4 traces fire only when an outer trace is active.

    - `Ix/Cli/CompileCmd.lean`: `--out` now defaults to the lowercase input
      file stem with `.ixe` (e.g. `CompileMathlib.lean → compilemathlib.ixe`)
      instead of being optional-only.
    - `Ix/Cli/CheckIxonCmd.lean` + `src/ffi/kernel.rs`: `--fail-out` is now
      streamed from Rust under a `Mutex<File>`. The file gets a header
      immediately, one record per failure as it's detected (flushed each
      write), and a `# total failures: N` footer. `tail -f` works during a
      long full-env run instead of seeing nothing until the very end. The
      Lean-side `rsCheckIxonFFI` opaque takes an extra `String` argument
      (empty == no streaming).
    - `check-ixon` skips the full-env name preflight when the filter is
      exact-only (no prefix patterns), saving a redundant FFI scan.
    - New test-only FFI `rs_kernel_check_malformed_rec_rule_ixon` (gated on
      `feature = "test-ffi"`) compiles a fixture, deliberately corrupts a
      recursor rule in the compiled Ixon payload (swap second rule's body
      to return the first minor), and runs the kernel against that exact
      malformed payload. The production aux-gen path would otherwise
      sanitize the bad rule before the kernel sees it; this hook lets
      `Tests/Ix/Kernel/Tutorial.lean::AdvNat.rec` exercise the rejection.
    - New `Tests/Ix/Kernel/NatReduction.lean` (290 lines) builds raw
      `Lean.Declaration` values with `.lit (.natVal _)` to bypass the
      elaborator's `OfNat.ofNat` wrapping and exercise `try_reduce_nat`
      directly. Sections A–H cover per-primitive literal-on-literal
      parity, `Nat.zero` literal-extension recognition, succ/zero chains,
      mixed literal/constructor def-eq, negative tests guarding
      over-reduction, the `Nat.pow` cap, the linear `Nat.rec` shortcut,
      and `Nat.pred`.
    - `Tests/Ix/Kernel/CheckEnv.lean` now skips both the `TutorialDefs` and
      `NatReduction` modules; the focus list adds the residue from the
      2026-04-30 follow-up runs (Fin/BitVec/Lean.Grind/`_sparseCasesOn_`
      cases).
    - `Tests/Ix/Kernel/Tutorial.lean` wires the malformed-rec-rule FFI to
      the `AdvNat.rec` test case, with a per-name override on `expectPass`.
    - `aux_gen/recursor.rs::ingress_aux_gen_dep` now walks ctor types
      referenced by `InductInfo` (so dependencies of the inductive's
      constructors are pulled into the kernel context alongside the
      inductive itself), and `collect_const_refs` records the projected
      constant on `Proj` instead of recursing only into the projected
      value.
  Three coupled threads land together. They all sit on top of the
  alpha-invariant `cnst_hash` from commit 8f15dc0 (kernel: add FVar
  binder opening, alpha-invariant hashing, Nat reducer rework) — name-
  erased Const hashing exposes WHNF cache aliasing in places where
  aux_gen previously assumed source-name fidelity, and the alias plumbing
  on the validate-aux side previously assumed source-shape regeneration.

  == 1. find_rec_target: syntactic-first peel ==

  `src/ix/compile/aux_gen/recursor.rs::find_rec_target` is rewritten as
  a two-phase walk:

    Phase 1 (new): peel `ForallE` binders syntactically — no kernel
      call. Match the head against `class.all_names` at every depth
      via the new helper `match_classes_against_app`. The source name
      is preserved exactly because no WHNF runs.
    Phase 2 (legacy): full kernel-WHNF peel as a fallback. Only fires
      when Phase 1 doesn't find a class-member head, i.e. the head is
      a reducible alias not in `classes` (`Set σ := σ → Prop`,
      `constType := λ α. α → α`, …). WHNF then delta-unfolds it.

  Why: the kernel's `cnst_hash` includes only `id.addr`, not `id.name`,
  so the WHNF cache returns *whichever* display name was inserted first
  when two source names share a canonical address (alpha-collapse:
  `A` ≅ `B`, or synthetic `_nested.List_1` ≅ `_nested.List_2` once their
  type args collapse). Source-shape singleton-class aux_gen needs the
  original source name to dispatch each recursive field to the right
  motive; cache aliasing made it dispatch to the twin's class, which
  showed up in validate-aux as IH binders pointing at the wrong motive
  across `AlphaCollapse.{A,B,A2,B2,C2,A'}` (direct + synthetic auxes),
  `NestedAlphaCollapse.{TreeA,TreeB}` (nested aux types collapsed),
  and `HigherOrderRec.{FA,FB}` (higher-order field `Nat → FA` whose
  inner head was cache-aliased after peeling).

  The syntactic phase handles all three categories — direct `A`,
  parameterized `List A`, and higher-order `Nat → A` / `(α → β) → A`
  fields — without ever touching the kernel cache. Phase 2 stays in
  place for genuinely reducible alias heads.

  `build_minor_type` and `build_rec_rules` thread `stt` through to the
  new signature.

  Three `#[cfg(test)]` regression tests in `recursor.rs` cover the
  AlphaCollapse case end-to-end:
    - `test_alpha_collapse_sort_consts_groups_inductives` — `sort_consts`
      produces one class for the alpha-equivalent pair.
    - `test_alpha_collapse_compile_env_addresses_inductives_and_ctors` —
      `compile_env` resolves both members and both ctors to one address.
    - `test_alpha_collapse_aux_gen_aliases_primary_aux_to_rep` —
      `generate_aux_patches` emits one patch per representative and
      routes the alias names through `aux_out.aliases` (see Section 3).

  == 2. validate-aux: alpha-collapse-aware aux congruence ==

  `src/ffi/lean_env.rs` grows a tiered comparator that knows how to
  relate three different views of an aux:

    - decompiled — reconstructed from canonical (collapsed) Ixon
    - generated  — singleton-class aux_gen baseline (source shape)
    - original   — Lean's source-walk export

  New helpers:
    - `primary_addresses_collapse(all, stt)` — does some pair of the
      block's primaries share a canonical address?
    - `build_aux_perm_ctx(all, env, stt, perm)` — assembles a `PermCtx`
      with `aux_perm`, `rec_heads`, `const_addr`, and (when collapse
      is in effect) a collapse-driven `B → A` `const_map`. Same shape
      as the existing `build_perm_ctx_1b` / `build_perm_ctx`; the
      callers now also build a `PermCtx` with empty `aux_perm`
      whenever `primary_addresses_collapse` is true so the comparator
      can apply the const_map even on blocks with no nested auxes.
    - `aux_patch_to_lean_ci(patch)` — converts a `PatchedConstant`
      into a `LeanCI` so it can flow through `const_alpha_eq*`.
    - `aux_congruence_result(name, decompiled, original, entry)` —
      Tier 1 (decompiled vs original, perm-aware), Tier 2 (generated
      vs original with collapse-stripped ctx since both sides are
      source-shape), Tier 3 (decompiled vs generated, perm-aware).
      Only fails when every tier fails.
    - `build_aux_compare_contexts(env, stt)` — builds an
      `AuxCompareEntry { generated, ctx }` per aux name, walked once
      per block keyed on the sorted name list.

  Phase 6 (post-compile aux congruence) and Phase 7b (per-constant
  roundtrip fidelity) now route every aux-suffixed name through
  `aux_congruence_result`, falling back to the old hash-based check
  when a name has no entry. This covers blocks that get intentionally
  canonicalized by primary alpha-collapse or nested-aux permutation,
  which previously failed Phase 6/7b on source-shape comparison even
  though the canonical layout was correct.

  Phase 1b ingress congruence keeps `const_map` empty: it compares two
  source-shape views (singleton-class aux_gen output vs original Lean),
  so a collapse-driven `B → A` rewrite would *break* the comparison
  rather than help. The comments at the call sites are updated to
  make the regime split explicit.

  == 3. aux_gen: alias-only patching for collapsed primaries ==

  `src/ix/compile/aux_gen.rs` simplifies the alias path. Previously,
  each non-representative member of an alpha-collapsed class got a
  deep-renamed `B.casesOn` / `B.below` / `B.recOn` / `B.brecOn{,.go,.eq}`
  patch produced via `rename_patch` + `below::rename_below_indc`. That
  deep-renaming created source-shaped auxiliaries instead of the class
  canonical ones, which then disagreed with the canonical block's
  content addresses.

  The collapse path now keeps **one** patch per representative and
  inserts the alias names into `aux_out.aliases` so every Lean-exported
  non-rep name resolves to the rep's canonical patch. Prop-level
  `.below` is itself an inductive, so the aliasing also covers its
  positional constructors (`B.below.b → A.below.a`).

  Removed (now dead): `aux_gen.rs::build_alias_name_map`,
  `aux_gen.rs::rename_patch`, `aux_gen.rs::_name_parent`,
  `below.rs::rename_below_indc` (and the `replace_const_names` import
  on the `below.rs` side). About 220 lines of rename plumbing become
  five lines of `aliases.insert`.

  == 4. level_alpha_eq: smart-constructor normalization ==

  `src/ix/congruence.rs::level_alpha_eq` now normalizes both sides
  through `Level::max_smart` / `Level::imax_smart` before structural
  compare. Internal helpers `normalize_level` (idempotent, bottom-up)
  and `level_alpha_eq_struct` (strict structural eq on already-
  normalized levels) keep recursion from re-normalizing at every step.

  Why: `aux_gen::expr_utils::subst_level` routes through the smart
  constructors (commit ec95312 "Align nested-aux canonical order"),
  while Lean's `Level.instantiateParams` keeps the un-simplified
  factored form. `Sort (max u u)` from Lean and `Sort u` from aux_gen
  are semantically equal but structurally distinct under the old
  strict comparator; that flagged every nested-inductive call site
  as a congruence failure. Smart-constructor normalization closes
  the gap without weakening the comparator (only semantically-valid
  simplifications fire: `max(a,a) = a`, zero absorption, same-base
  offset, `Max` absorption, and the analogous `imax` rules). `Succ`
  is intentionally left raw — both sides preserve the factored form.

  A `#[cfg(test)] mod tests` block adds eight regression tests pinning
  each simplification rule and an idempotency check.

  == 5. ConstantMeta::muts now crosses the FFI ==

  `src/lean.rs` extends `LeanIxonNamed` from 2 obj fields to 3 and adds
  tag 7 (`muts`) to the `LeanIxonConstantMetaInfo` shape spec. The
  matching FFI build/decode in `src/ffi/ixon/meta.rs` carries the
  alpha-equivalence classes (`Array (Array Address)`) through Lean,
  defaulting the Rust-only `aux_layout` sidecar to `None` on decode
  (the sidecar still survives Rust → Rust through `put_indexed` /
  `get_indexed`).

  Lean side: `Tests/Gen/Ixon.lean` adds a `muts` arm to
  `genConstantMeta`. `Tests/Ix/Compile.lean::testCrossImpl` adds a
  `muts` case to the meta dump and the variant-tag/field comparison
  so the cross-impl roundtrip test actually exercises the new variant.

  == 6. validate-aux scope expansion ==

  `Tests/Ix/Compile/ValidateAux.lean` keeps `Tests.Ix.Compile.Mutual`
  as the active prefix and comments out the rest (Init, Lean, State,
  TutorialDefs). This is the new default scope for the iterative
  debugging loop; uncomment locally to re-enable the larger sets.

  == 7. Cosmetic / debug ==

    - `src/ix/kernel/infer.rs`: the App-`ensure_forall` failure dump
      is now gated on `IX_INFER_APP_FORALL_DUMP`, with an optional
      `IX_KERNEL_DEBUG_CONST=<substring>` filter via
      `debug_label_matches_env`. Off by default — the old
      unconditional dump produced megabytes of `f`/`f_ty`/`a` output
      on mathlib-scale failures and hid the constant-level signal.
    - `src/ix/kernel/{def_eq,equiv,inductive}.rs`: rustfmt fallout
      from the gated infer changes (no behavioral diff).
    - `src/ix/kernel/primitive.rs`: `string_back` /
      `string_legacy_back` blake3 addresses re-pinned after upstream
      Lean churn; `BuildPrimitives` will keep this in sync going
      forward.

  == Verification ==

  `lake test -- validate-aux --ignored` (Mutual + dependencies, 2462
  constants, 67 unique aux blocks):
    Phase 1 Compilation                 2462 pass, 0 fail
    Phase 2 Aux_gen congruence           863 pass, 0 fail
    Phase 3 No ephemeral leaks           993 pass, 0 fail
    Phase 4 Alpha-equivalence canonicity  97 pass, 0 fail
    Phase 5 Decompile (with debug)      1469 pass, 0 fail
    Phase 6 Aux congruence (roundtrip)   993 pass, 0 fail
    Phase 7 Decompile (without debug)   1469 pass, 0 fail
    Phase 7b Roundtrip fidelity         2462 pass, 0 fail
    Phase 8 Nested detection               7 pass, 0 fail

  `cargo test --lib --release aux_gen`: 87 pass, 0 fail, 1 ignored.
Comment thread docs/ix_canonicity.md
Comment thread docs/ix_canonicity.md
Comment thread docs/ix_canonicity.md Outdated
Comment thread lakefile.lean
Comment thread Tests/Main.lean
johnchandlerburnham and others added 8 commits May 1, 2026 17:28
  Three changes that all trace back to alpha-invariant `cnst_hash`
  (8f15dc0): the kernel WHNF cache is keyed by content address only, so
  display names alias across alpha-twin pairs (`Paths` vs `Symmetrify`,
  `_nested.List_1` vs `_nested.List_2`, etc.). Aux generation is
  source-shape-sensitive — the regenerated recursor must keep the caller's
  Lean spelling so motives, IH telescopes, and class dispatch line up
  with what Lean's exporter emitted.

  decompose_inductive_type: syntactic-first peeling.
  The stored inductive type is already a forall telescope and Lean's
  exporter doesn't WHNF its index domains either, so eager WHNF drifts
  binders away from source shape. We now WHNF only when the current head
  isn't already a forall — both before the param loop and between
  substitutions — which still exposes reducible-alias targets like
  `Set σ := σ → Prop` while leaving the common case untouched.

  TcScope::whnf_lean: source-name overlay on egress.
  Callers that genuinely need WHNF can still get the cached twin's name
  back. After egressing the kernel result we overlay the caller's source
  names structurally:

    * if the kernel content hash is unchanged, walk the generated/source
      pair in lockstep and copy `Const`, `Proj`, and binder names verbatim
      while keeping the reduced levels and subterms (resolved via
      `same_resolved_name_addr` so display-name aliases match);
    * if WHNF actually reduced, collect source-shaped subterm hints
      (`App`, `Proj`, no BVars — de Bruijn indices are unstable under
      freshly-exposed binders) keyed by kernel content hash, then
      re-spell matching subterms inside the reduct.

  This keeps the structural reduct that callers like
  `build_ih_type_fvar` / `build_rule_ih_fvar` and singleton-class
  dispatch see, but with the caller's display names restored.

  find_rec_target: Phase 1/2 split + WHNF cache prewarming.
  Phase 1 still peels `ForallE` syntactically and matches against
  `classes` without touching the kernel cache, so source-shape singleton
  generation dispatches correctly even when the head is one half of an
  alpha-collapsed pair. The change: even on a Phase 1 hit we now run a
  single `scope.whnf_lean(dom)` before returning, so the per-worker WHNF
  cache is warm for the downstream `build_ih_type_fvar` /
  `build_rule_ih_fvar` calls that re-WHNF the same `field_dom`. Without
  this prewarming, every recursive field's downstream WHNF is cold and
  the cumulative cost dominates wall-clock time on mathlib-scale runs
  (hundreds of seconds in Phase 5 Pass 2). Phase 2's
  `IX_FIND_REC_TARGET_DUMP` debug instrumentation is removed.
Co-authored-by: Samuel Burnham <45365069+samuelburnham@users.noreply.github.com>
* Fix lean-ffi

* Fmt

* More lean-ffi fixes

* More lean-ffi fixes

* Fix `kernel-ixon-roundtrip` test

* fix Valgrind test

* comment out `rust-decompile`

* Prep for review
@johnchandlerburnham johnchandlerburnham merged commit a9e4317 into main May 7, 2026
14 checks passed
@johnchandlerburnham johnchandlerburnham deleted the jcb/kernel branch May 7, 2026 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants