feat(tableau): word-fused cz_block and faster Python MSD path#164
Open
david-pl wants to merge 3 commits into
Open
feat(tableau): word-fused cz_block and faster Python MSD path#164david-pl wants to merge 3 commits into
david-pl wants to merge 3 commits into
Conversation
Add `GeneralizedTableau::cz_block(control_base, target_base, count)`: a high-level fused CZ over constant-offset pairs that splits a run at u64-word boundaries internally and dispatches to the existing `cz_block_pairs` / `cz_block_pairs_cross_word` kernels, so callers use plain qubit indices and never reason about the word packing. Exposed through the PyO3 interface and the Python `GeneralizedTableau` wrapper. Speed up the Python gate path: - mixins: pass gate targets straight to the native layer (PyO3 extracts `Vec<usize>` from lists/tuples/ranges/ndarrays directly) instead of rebuilding a list with a per-element `int()` on every call; concrete-type fast paths in `_is_sequence` avoid the slow ABC `isinstance`. - measurement: convert raw outcome codes via a shared `_BY_VALUE` tuple lookup instead of the per-element `MeasurementResult()` IntEnum constructor (shared with `GeneralizedTableauSum`). Benchmarks: - add a pytest-benchmark MSD benchmark (test/benchmarks/test_msd.py) with splatted gates and build-only / build+measure arms. - tableau-msd-fused.rs now calls `cz_block` with qubit indices (runtime unchanged vs the hand-split kernels, verified by an A/B baseline). Net: the 85-qubit MSD build drops ~124us -> ~62us. The full shot (build + readout) is ~92us versus the Rust fused bench's ~56us, i.e. still ~1.6x; the remaining gap is per-call Python dispatch plus the scattered encode CZs (shared with Rust) and the measurement loop (real work both languages pay). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
FYI, @Roger-luo this removes some regressions introduced when matching STIM parity ( |
|
There was a problem hiding this comment.
Pull request overview
This PR improves GeneralizedTableau performance across Rust and Python by adding a new fused constant-offset CZ primitive (cz_block), reducing Python-side dispatch overhead for “splatted” gate calls, and adding benchmarks to track MSD circuit performance.
Changes:
- Added
GeneralizedTableau::cz_block(control_base, target_base, count)in Rust, exposed through the PyO3 interface and Python wrapper. - Reduced Python gate/measurement overhead by avoiding per-element
int()conversions for gate targets and using a cached enum lookup for measurement decoding. - Added a pytest-benchmark MSD benchmark mirroring the Rust fused MSD bench.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
ppvm-python/test/benchmarks/test_msd.py |
Adds a pytest-benchmark version of the 85-qubit MSD fused circuit (build-only and build+measure). |
ppvm-python/src/ppvm/mixins.py |
Avoids rebuilding target lists (lets PyO3 extract Vec<usize> directly); adds faster _is_sequence checks. |
ppvm-python/src/ppvm/generalized_tableau.py |
Adds cz_block wrapper and speeds up measurement decoding via _BY_VALUE. |
ppvm-python/src/ppvm/generalized_tableau_sum.py |
Deduplicates measurement decoding lookup by importing _BY_VALUE. |
ppvm-python/src/ppvm/_core.pyi |
Updates type stub to include _GeneralizedTableauBase.cz_block(...). |
crates/ppvm-tableau/src/data.rs |
Implements the fused cz_block entry point and adds an across-word-boundary equivalence test. |
crates/ppvm-tableau/benches/tableau-msd-fused.rs |
Simplifies the fused MSD bench to call cz_block with qubit indices. |
crates/ppvm-python-native/src/interface_tableau.rs |
Exposes cz_block through the PyO3 interface. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return isinstance(obj, Iterable) | ||
|
|
||
|
|
||
| def _normalize_targets(args: tuple[Any, ...]) -> Sequence[int]: |
Comment on lines
71
to
+75
| def _split_targets_parameter( | ||
| args: tuple[Any, ...], | ||
| value: Any | None, | ||
| name: str, | ||
| ) -> tuple[list[int], Any]: | ||
| ) -> tuple[Sequence[int], Any]: |
| name: str, | ||
| truncate: bool, | ||
| ) -> tuple[list[int], Any, bool]: | ||
| ) -> tuple[Sequence[int], Any, bool]: |
Comment on lines
+161
to
+163
| Applies CZ to ``(control_base + i, target_base + i)`` for ``i`` in | ||
| ``range(count)`` -- i.e. the gates ``zip(range(control_base, ...), | ||
| range(target_base, ...))`` would produce. This uses a word-level kernel |
| """ | ||
| self._interface.t_dag(_normalize_targets(targets)) | ||
|
|
||
| def cz_block(self, control_base: int, target_base: int, count: int) -> None: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Speeds up the Python
GeneralizedTableauMSD path with a new word-fused CZ primitive and lower-overhead gate dispatch, and adds benchmarks for both languages.The 85-qubit MSD circuit build drops from ~124µs to ~62µs. The full shot (build + readout) is ~92µs versus the Rust fused bench's ~56µs — still ~1.6×, not parity. The remaining gap is per-call Python dispatch, the scattered encode CZs (which use
cz_manyin Rust too), and the measurement loop (real work both languages pay).Changes
New
cz_blockfused CZ (Rust + Python)GeneralizedTableau::cz_block(control_base, target_base, count)inppvm-tableau: applies CZ to constant-offset pairs(control_base+i, target_base+i)and splits the run atu64-word boundaries internally, dispatching to the existingcz_block_pairs/cz_block_pairs_cross_wordkernels. Callers use plain qubit indices — no word/bit arithmetic.GeneralizedTableauwrapper (+ _core.pyistub).cz_many(O(n) per pair) to word-parallel block ops (O(n/64) per block). The scattered encode CZs stay oncz_many(no contiguous structure to fuse — same as the Rust bench).Faster Python gate dispatch
mixins: pass gate targets straight to the native layer — PyO3 extractsVec<usize>directly from lists/tuples/ranges/ndarrays — instead of rebuilding a list with a per-elementint()on every call. Concrete-type fast paths in_is_sequenceavoid the slow ABCisinstance(obj, Iterable)._BY_VALUEtuple lookup rather than the per-elementMeasurementResult()IntEnum constructor (deduplicated withGeneralizedTableauSum).Benchmarks
test/benchmarks/test_msd.py: the 85-qubit MSD circuit with splatted gates, build-only and build+measure arms (smoke-tests by default;--benchmark-enableto time).tableau-msd-fused.rsnow callscz_blockwith qubit indices instead of the hand-split word/bit calls — runtime unchanged (verified with a criterion A/B baseline: within noise).Verification
ppvm-tableaulib tests pass, including a newcz_blockcross-word + reversed-bases equivalence test.np.int64target forms).cz_blockproven equivalent to per-pairczon coefficients.🤖 Generated with Claude Code