Skip to content

Provid indexing for the CTable object#620

Merged
FrancescAlted merged 10 commits intoctable4from
ctable-indexing
Apr 15, 2026
Merged

Provid indexing for the CTable object#620
FrancescAlted merged 10 commits intoctable4from
ctable-indexing

Conversation

@FrancescAlted
Copy link
Copy Markdown
Member

This PR implements the CTable indexing work planned in plans/ctable-indexing.md and rounds it out with follow-up fixes, docs, and usability improvements.

What it adds

  • Persistent and in-memory indexes for CTable columns, with support for bucket, partial, and full index kinds.
  • Indexed where() execution with graceful scan fallback when an index cannot be used.
  • Persistent index lifecycle management:
    • survive reopen
    • rebuild / compact
    • stale tracking on data mutation
    • correct cleanup/rebuild on column drop and rename
  • Better multi-column indexed planning for conjunctive filters.
  • Packed .b2z support for reopening indexed tables and querying them directly, without unpacking first.

Supporting work

  • New example: richer examples/ctable/indexing.py with mixed dtypes, persistence, packing, reopen, and direct .b2z queries.
  • Tutorial wiring for CTable indexing docs.
  • Clearer handling of malformed index metadata.
  • Default-open-mode migration groundwork:
    • blosc2.open() now warns when mode= is omitted
    • internal indexing/mmap paths were updated to use explicit modes

Usability improvements

  • CTable.info was upgraded substantially:
    • cleaner schema display
    • persistent open_mode
    • compact index size summaries
    • less exposure of storage-capacity internals
  • CTable.Column has a compact preview-style repr.
  • Boolean columns now compose naturally in where() expressions.

Validation

  • Added regression coverage across CTable indexing, schema mutation/index lifecycle, info rendering, and example-oriented behavior.
  • Focused test suites and Ruff checks are passing.

 - New CTableIndex handle with col_name, kind, name, stale properties
 - create_index(), drop_index(), rebuild_index(), compact_index() methods
 - index() lookup and indexes property on CTable
 - _CTableIndexProxy duck-type shim routes sidecar files to
   <table.b2d>/_indexes/<col_name>/ for persistent tables
 - Index catalog stored in /_meta vlmeta; survives table close/reopen
 - where() automatically uses a fresh index; falls back to scan when stale
 - Epoch tracking: mutations (append, extend, setitem, assign, sort_by,
   compact) mark all indexes stale; delete() bumps visibility_epoch only
 - Views raise ValueError for all index management methods
 - Add _indexes to reserved column names in schema_compiler
 - 32 new tests in tests/ctable/test_ctable_indexing.py
 - New example examples/ctable/indexing.py
 - New tutorial doc/getting_started/tutorials/15.indexing-ctables.ipynb

 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
  Fix CTable index lifecycle for schema mutations by removing index catalog
  entries and sidecars when indexed columns are dropped, and rebuilding
  indexes under the new name after column renames.

  Improve indexed CTable filtering so where() can expose multiple usable
  column indexes to the planner for conjunctive predicates, and raise a
  clear error for malformed table-owned index metadata instead of silently
  falling back to scans.

  Add regression coverage for indexed column rename/drop behavior,
  multi-column indexed conjunctions, and malformed catalog entries.
  Wire the CTable indexing tutorial into the docs toctree.
  Implement the first phase of plans/changing-default-open-mode.md by
  tracking omitted mode= with a sentinel and emitting a FutureWarning when
  blosc2.open() relies on the current implicit "a" behavior.

  Update mmap-related tests, examples, and docstrings to pass explicit
  mode="r" so they keep exercising their intended paths without tripping
  the migration warning.
@FrancescAlted
Copy link
Copy Markdown
Member Author

NOTE: once this is merged, we will need to remember that there is an ongoing transition towards making 'r'eadonly the default open mode in blosc2.open(). This will need to be completed probably around 4.3 release (next one is going to be 4.2).

@FrancescAlted FrancescAlted merged commit 6e75a47 into ctable4 Apr 15, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant