CTable: full feature build-out (persistency, aggregates, mutations, QoL)

Jacc4224 · Jacc4224 · commit 0472b3fbcc36 · 2026-04-06T14:35:09.000+02:00
Persistency:
    - FileTableStorage backend: disk layout _meta.b2frame / _valid_rows.b2nd / _cols/&lt;name&gt;.b2nd
    - CTable(Row, urlpath=..., mode="w"/"a"/"r"), CTable.open(), CTable.save(), CTable.load()
    - Read-only mode blocks all writes; save() always writes compacted rows

  Column aggregates: sum, min, max, mean, std, any, all (chunk-aware via iter_chunks)
  Column utilities: unique(), value_counts(), assign(), boolean mask __getitem__/__setitem__

  Schema mutations: add_column (fills default for existing rows), drop_column, rename_column
    - All three update schema, handle disk files, and block on views

  View mutability model fix:
    - Views allow value writes (assign, __setitem__) — only structural mutations are blocked
    - _read_only=True reserved for mode="r" disk tables; base is not None guards structural ops

  QoL: __str__ pandas-style, __repr__, cbytes/nbytes, sample(n), Column.iter_chunks(size)

  Tests: 258 tests, ~5s — new test_persistency.py (33), test_schema_mutations.py (41),
    expanded test_column.py; optimized helpers to use to_numpy() instead of row[i]
diff --git a/bench/ctable/bench_append_regression.py b/bench/ctable/bench_append_regression.py
@@ -20,7 +20,7 @@
 
 import blosc2
 from blosc2.schema_compiler import compile_schema
-from blosc2.schema_validation import validate_row, build_validator_model
+from blosc2.schema_validation import build_validator_model, validate_row
 
 
 @dataclass
@@ -113,5 +113,5 @@ class Row:
 print(f"{'Per-row Pydantic cost (isolated)':<40} {(t_validate/N)*1e6:.2f} µs/row")
 print()
 print(f"Note: append() is dominated by blosc2 I/O ({t_append_off/t_raw:.0f}x raw numpy),")
-print(f"      not by the validation pipeline.")
-print(f"      The main bottleneck is the last_true_pos backward scan per row.")
+print("      not by the validation pipeline.")
+print("      The main bottleneck is the last_true_pos backward scan per row.")