Skip to content

feat(tableau): word-fused cz_block and faster Python MSD path#164

Open
david-pl wants to merge 3 commits into
mainfrom
feat/tableau-cz-block
Open

feat(tableau): word-fused cz_block and faster Python MSD path#164
david-pl wants to merge 3 commits into
mainfrom
feat/tableau-cz-block

Conversation

@david-pl

Copy link
Copy Markdown
Collaborator

Summary

Speeds up the Python GeneralizedTableau MSD path with a new word-fused CZ primitive and lower-overhead gate dispatch, and adds benchmarks for both languages.

The 85-qubit MSD circuit build drops from ~124µs to ~62µs. The full shot (build + readout) is ~92µs versus the Rust fused bench's ~56µs — still ~1.6×, not parity. The remaining gap is per-call Python dispatch, the scattered encode CZs (which use cz_many in Rust too), and the measurement loop (real work both languages pay).

Changes

New cz_block fused CZ (Rust + Python)

  • GeneralizedTableau::cz_block(control_base, target_base, count) in ppvm-tableau: applies CZ to constant-offset pairs (control_base+i, target_base+i) and splits the run at u64-word boundaries internally, dispatching to the existing cz_block_pairs / cz_block_pairs_cross_word kernels. Callers use plain qubit indices — no word/bit arithmetic.
  • Exposed through the PyO3 interface and the Python GeneralizedTableau wrapper (+ _core.pyi stub).
  • This is the lever for the build speedup: the contiguous cross-block CZ layers go from per-pair cz_many (O(n) per pair) to word-parallel block ops (O(n/64) per block). The scattered encode CZs stay on cz_many (no contiguous structure to fuse — same as the Rust bench).

Faster Python gate dispatch

  • mixins: pass gate targets straight to the native layer — PyO3 extracts Vec<usize> directly from lists/tuples/ranges/ndarrays — instead of rebuilding a list with a per-element int() on every call. Concrete-type fast paths in _is_sequence avoid the slow ABC isinstance(obj, Iterable).
  • Measurement: convert raw outcome codes through a shared _BY_VALUE tuple lookup rather than the per-element MeasurementResult() IntEnum constructor (deduplicated with GeneralizedTableauSum).

Benchmarks

  • New pytest-benchmark test/benchmarks/test_msd.py: the 85-qubit MSD circuit with splatted gates, build-only and build+measure arms (smoke-tests by default; --benchmark-enable to time).
  • tableau-msd-fused.rs now calls cz_block with qubit indices instead of the hand-split word/bit calls — runtime unchanged (verified with a criterion A/B baseline: within noise).

Verification

  • ppvm-tableau lib tests pass, including a new cz_block cross-word + reversed-bases equivalence test.
  • 204 Python tests pass (covers list/tuple/range/ndarray/np.int64 target forms).
  • cz_block proven equivalent to per-pair cz on coefficients.

🤖 Generated with Claude Code

Add `GeneralizedTableau::cz_block(control_base, target_base, count)`: a
high-level fused CZ over constant-offset pairs that splits a run at u64-word
boundaries internally and dispatches to the existing `cz_block_pairs` /
`cz_block_pairs_cross_word` kernels, so callers use plain qubit indices and
never reason about the word packing. Exposed through the PyO3 interface and the
Python `GeneralizedTableau` wrapper.

Speed up the Python gate path:
- mixins: pass gate targets straight to the native layer (PyO3 extracts
  `Vec<usize>` from lists/tuples/ranges/ndarrays directly) instead of rebuilding
  a list with a per-element `int()` on every call; concrete-type fast paths in
  `_is_sequence` avoid the slow ABC `isinstance`.
- measurement: convert raw outcome codes via a shared `_BY_VALUE` tuple lookup
  instead of the per-element `MeasurementResult()` IntEnum constructor (shared
  with `GeneralizedTableauSum`).

Benchmarks:
- add a pytest-benchmark MSD benchmark (test/benchmarks/test_msd.py) with
  splatted gates and build-only / build+measure arms.
- tableau-msd-fused.rs now calls `cz_block` with qubit indices (runtime
  unchanged vs the hand-split kernels, verified by an A/B baseline).

Net: the 85-qubit MSD build drops ~124us -> ~62us. The full shot (build +
readout) is ~92us versus the Rust fused bench's ~56us, i.e. still ~1.6x; the
remaining gap is per-call Python dispatch plus the scattered encode CZs (shared
with Rust) and the measurement loop (real work both languages pay).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 30, 2026 08:52
@david-pl

Copy link
Copy Markdown
Collaborator Author

FYI, @Roger-luo this removes some regressions introduced when matching STIM parity (_normalize_targets was slow).

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://QuEraComputing.github.io/ppvm/pr-preview/pr-164/

Built to branch gh-pages at 2026-07-01 07:37 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves GeneralizedTableau performance across Rust and Python by adding a new fused constant-offset CZ primitive (cz_block), reducing Python-side dispatch overhead for “splatted” gate calls, and adding benchmarks to track MSD circuit performance.

Changes:

  • Added GeneralizedTableau::cz_block(control_base, target_base, count) in Rust, exposed through the PyO3 interface and Python wrapper.
  • Reduced Python gate/measurement overhead by avoiding per-element int() conversions for gate targets and using a cached enum lookup for measurement decoding.
  • Added a pytest-benchmark MSD benchmark mirroring the Rust fused MSD bench.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
ppvm-python/test/benchmarks/test_msd.py Adds a pytest-benchmark version of the 85-qubit MSD fused circuit (build-only and build+measure).
ppvm-python/src/ppvm/mixins.py Avoids rebuilding target lists (lets PyO3 extract Vec<usize> directly); adds faster _is_sequence checks.
ppvm-python/src/ppvm/generalized_tableau.py Adds cz_block wrapper and speeds up measurement decoding via _BY_VALUE.
ppvm-python/src/ppvm/generalized_tableau_sum.py Deduplicates measurement decoding lookup by importing _BY_VALUE.
ppvm-python/src/ppvm/_core.pyi Updates type stub to include _GeneralizedTableauBase.cz_block(...).
crates/ppvm-tableau/src/data.rs Implements the fused cz_block entry point and adds an across-word-boundary equivalence test.
crates/ppvm-tableau/benches/tableau-msd-fused.rs Simplifies the fused MSD bench to call cz_block with qubit indices.
crates/ppvm-python-native/src/interface_tableau.rs Exposes cz_block through the PyO3 interface.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return isinstance(obj, Iterable)


def _normalize_targets(args: tuple[Any, ...]) -> Sequence[int]:
Comment on lines 71 to +75
def _split_targets_parameter(
args: tuple[Any, ...],
value: Any | None,
name: str,
) -> tuple[list[int], Any]:
) -> tuple[Sequence[int], Any]:
name: str,
truncate: bool,
) -> tuple[list[int], Any, bool]:
) -> tuple[Sequence[int], Any, bool]:

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Comment on lines +161 to +163
Applies CZ to ``(control_base + i, target_base + i)`` for ``i`` in
``range(count)`` -- i.e. the gates ``zip(range(control_base, ...),
range(target_base, ...))`` would produce. This uses a word-level kernel
"""
self._interface.t_dag(_normalize_targets(targets))

def cz_block(self, control_base: int, target_base: int, count: int) -> None:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants