feat(tableau): word-fused cz_block and faster Python MSD path by david-pl · Pull Request #164 · QuEraComputing/ppvm

david-pl · 2026-06-30T08:52:44Z

Summary

Speeds up the Python GeneralizedTableau MSD path with a new word-fused CZ primitive and lower-overhead gate dispatch, and adds benchmarks for both languages.

The 85-qubit MSD circuit build drops from ~124µs to ~62µs. The full shot (build + readout) is ~92µs versus the Rust fused bench's ~56µs — still ~1.6×, not parity. The remaining gap is per-call Python dispatch, the scattered encode CZs (which use cz_many in Rust too), and the measurement loop (real work both languages pay).

Changes

New `cz_block` fused CZ (Rust + Python)

GeneralizedTableau::cz_block(control_base, target_base, count) in ppvm-tableau: applies CZ to constant-offset pairs (control_base+i, target_base+i) and splits the run at u64-word boundaries internally, dispatching to the existing cz_block_pairs / cz_block_pairs_cross_word kernels. Callers use plain qubit indices — no word/bit arithmetic.
Exposed through the PyO3 interface and the Python GeneralizedTableau wrapper (+ _core.pyi stub).
This is the lever for the build speedup: the contiguous cross-block CZ layers go from per-pair cz_many (O(n) per pair) to word-parallel block ops (O(n/64) per block). The scattered encode CZs stay on cz_many (no contiguous structure to fuse — same as the Rust bench).

Faster Python gate dispatch

mixins: pass gate targets straight to the native layer — PyO3 extracts Vec<usize> directly from lists/tuples/ranges/ndarrays — instead of rebuilding a list with a per-element int() on every call. Concrete-type fast paths in _is_sequence avoid the slow ABC isinstance(obj, Iterable).
Measurement: convert raw outcome codes through a shared _BY_VALUE tuple lookup rather than the per-element MeasurementResult() IntEnum constructor (deduplicated with GeneralizedTableauSum).

Benchmarks

New pytest-benchmark test/benchmarks/test_msd.py: the 85-qubit MSD circuit with splatted gates, build-only and build+measure arms (smoke-tests by default; --benchmark-enable to time).
tableau-msd-fused.rs now calls cz_block with qubit indices instead of the hand-split word/bit calls — runtime unchanged (verified with a criterion A/B baseline: within noise).

Verification

ppvm-tableau lib tests pass, including a new cz_block cross-word + reversed-bases equivalence test.
204 Python tests pass (covers list/tuple/range/ndarray/np.int64 target forms).
cz_block proven equivalent to per-pair cz on coefficients.

🤖 Generated with Claude Code

Add `GeneralizedTableau::cz_block(control_base, target_base, count)`: a high-level fused CZ over constant-offset pairs that splits a run at u64-word boundaries internally and dispatches to the existing `cz_block_pairs` / `cz_block_pairs_cross_word` kernels, so callers use plain qubit indices and never reason about the word packing. Exposed through the PyO3 interface and the Python `GeneralizedTableau` wrapper. Speed up the Python gate path: - mixins: pass gate targets straight to the native layer (PyO3 extracts `Vec<usize>` from lists/tuples/ranges/ndarrays directly) instead of rebuilding a list with a per-element `int()` on every call; concrete-type fast paths in `_is_sequence` avoid the slow ABC `isinstance`. - measurement: convert raw outcome codes via a shared `_BY_VALUE` tuple lookup instead of the per-element `MeasurementResult()` IntEnum constructor (shared with `GeneralizedTableauSum`). Benchmarks: - add a pytest-benchmark MSD benchmark (test/benchmarks/test_msd.py) with splatted gates and build-only / build+measure arms. - tableau-msd-fused.rs now calls `cz_block` with qubit indices (runtime unchanged vs the hand-split kernels, verified by an A/B baseline). Net: the 85-qubit MSD build drops ~124us -> ~62us. The full shot (build + readout) is ~92us versus the Rust fused bench's ~56us, i.e. still ~1.6x; the remaining gap is per-call Python dispatch plus the scattered encode CZs (shared with Rust) and the measurement loop (real work both languages pay). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

david-pl · 2026-06-30T08:55:08Z

FYI, @Roger-luo this removes some regressions introduced when matching STIM parity (_normalize_targets was slow).

github-actions · 2026-06-30T08:56:40Z

PR Preview Action v1.8.1
🚀 View preview at https://QuEraComputing.github.io/ppvm/pr-preview/pr-164/
Built to branch `gh-pages` at 2026-07-01 07:37 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Copilot

Pull request overview

This PR improves GeneralizedTableau performance across Rust and Python by adding a new fused constant-offset CZ primitive (cz_block), reducing Python-side dispatch overhead for “splatted” gate calls, and adding benchmarks to track MSD circuit performance.

Changes:

Added GeneralizedTableau::cz_block(control_base, target_base, count) in Rust, exposed through the PyO3 interface and Python wrapper.
Reduced Python gate/measurement overhead by avoiding per-element int() conversions for gate targets and using a cached enum lookup for measurement decoding.
Added a pytest-benchmark MSD benchmark mirroring the Rust fused MSD bench.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`ppvm-python/test/benchmarks/test_msd.py`	Adds a pytest-benchmark version of the 85-qubit MSD fused circuit (build-only and build+measure).
`ppvm-python/src/ppvm/mixins.py`	Avoids rebuilding target lists (lets PyO3 extract `Vec<usize>` directly); adds faster `_is_sequence` checks.
`ppvm-python/src/ppvm/generalized_tableau.py`	Adds `cz_block` wrapper and speeds up measurement decoding via `_BY_VALUE`.
`ppvm-python/src/ppvm/generalized_tableau_sum.py`	Deduplicates measurement decoding lookup by importing `_BY_VALUE`.
`ppvm-python/src/ppvm/_core.pyi`	Updates type stub to include `_GeneralizedTableauBase.cz_block(...)`.
`crates/ppvm-tableau/src/data.rs`	Implements the fused `cz_block` entry point and adds an across-word-boundary equivalence test.
`crates/ppvm-tableau/benches/tableau-msd-fused.rs`	Simplifies the fused MSD bench to call `cz_block` with qubit indices.
`crates/ppvm-python-native/src/interface_tableau.rs`	Exposes `cz_block` through the PyO3 interface.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    return isinstance(obj, Iterable)
+
+
+def _normalize_targets(args: tuple[Any, ...]) -> Sequence[int]:


 def _split_targets_parameter(
    args: tuple[Any, ...],
    value: Any | None,
    name: str,
-) -> tuple[list[int], Any]:
+) -> tuple[Sequence[int], Any]:


    name: str,
    truncate: bool,
-) -> tuple[list[int], Any, bool]:
+) -> tuple[Sequence[int], Any, bool]:


Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

+        Applies CZ to ``(control_base + i, target_base + i)`` for ``i`` in
+        ``range(count)`` -- i.e. the gates ``zip(range(control_base, ...),
+        range(target_base, ...))`` would produce. This uses a word-level kernel


        """
        self._interface.t_dag(_normalize_targets(targets))

+    def cz_block(self, control_base: int, target_base: int, count: int) -> None:


Copilot AI review requested due to automatic review settings June 30, 2026 08:52

Copilot started reviewing on behalf of david-pl June 30, 2026 08:53 View session

Copilot AI reviewed Jun 30, 2026

View reviewed changes

david-pl and others added 2 commits June 30, 2026 11:22

Update skill

9be75e4

Merge branch 'main' into feat/tableau-cz-block

ec4b414

david-pl requested review from Roger-luo and Copilot July 1, 2026 07:33

Copilot started reviewing on behalf of david-pl July 1, 2026 07:33 View session

Copilot AI reviewed Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tableau): word-fused cz_block and faster Python MSD path#164

feat(tableau): word-fused cz_block and faster Python MSD path#164
david-pl wants to merge 3 commits into
mainfrom
feat/tableau-cz-block

david-pl commented Jun 30, 2026

Uh oh!

david-pl commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-07-01 07:37 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return isinstance(obj, Iterable)


		def _normalize_targets(args: tuple[Any, ...]) -> Sequence[int]:

Uh oh!

Conversation

david-pl commented Jun 30, 2026

Summary

Changes

New cz_block fused CZ (Rust + Python)

Faster Python gate dispatch

Benchmarks

Verification

Uh oh!

david-pl commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-07-01 07:37 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

New `cz_block` fused CZ (Rust + Python)

github-actions Bot commented Jun 30, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-07-01 07:37 UTC.
Preview will be ready when the GitHub Pages deployment is complete.