Skip to content

Commit 0472b3f

Browse files
committed
CTable: full feature build-out (persistency, aggregates, mutations, QoL)
Persistency: - FileTableStorage backend: disk layout _meta.b2frame / _valid_rows.b2nd / _cols/<name>.b2nd - CTable(Row, urlpath=..., mode="w"/"a"/"r"), CTable.open(), CTable.save(), CTable.load() - Read-only mode blocks all writes; save() always writes compacted rows Column aggregates: sum, min, max, mean, std, any, all (chunk-aware via iter_chunks) Column utilities: unique(), value_counts(), assign(), boolean mask __getitem__/__setitem__ Schema mutations: add_column (fills default for existing rows), drop_column, rename_column - All three update schema, handle disk files, and block on views View mutability model fix: - Views allow value writes (assign, __setitem__) — only structural mutations are blocked - _read_only=True reserved for mode="r" disk tables; base is not None guards structural ops QoL: __str__ pandas-style, __repr__, cbytes/nbytes, sample(n), Column.iter_chunks(size) Tests: 258 tests, ~5s — new test_persistency.py (33), test_schema_mutations.py (41), expanded test_column.py; optimized helpers to use to_numpy() instead of row[i]
1 parent a422d72 commit 0472b3f

1 file changed

Lines changed: 3 additions & 3 deletions

File tree

bench/ctable/bench_append_regression.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020

2121
import blosc2
2222
from blosc2.schema_compiler import compile_schema
23-
from blosc2.schema_validation import validate_row, build_validator_model
23+
from blosc2.schema_validation import build_validator_model, validate_row
2424

2525

2626
@dataclass
@@ -113,5 +113,5 @@ class Row:
113113
print(f"{'Per-row Pydantic cost (isolated)':<40} {(t_validate/N)*1e6:.2f} µs/row")
114114
print()
115115
print(f"Note: append() is dominated by blosc2 I/O ({t_append_off/t_raw:.0f}x raw numpy),")
116-
print(f" not by the validation pipeline.")
117-
print(f" The main bottleneck is the last_true_pos backward scan per row.")
116+
print(" not by the validation pipeline.")
117+
print(" The main bottleneck is the last_true_pos backward scan per row.")

0 commit comments

Comments
 (0)