Optimize statevector simulator preprocessing by Kenny-Heitritter · Pull Request #316 · qBraid/pyqasm

Kenny-Heitritter · 2026-05-26T14:36:48Z

Summary

This PR targets main from the current sv branch work plus one additional commit that ports the honest, non-cache AlphaEvolve Run 06 statevector improvements.

It deliberately does not include the evolved _PROGRAM_INSTRUCTION_CACHE / repeated-QASM memoization path, since that overfit the AlphaEvolve repeated-timing harness. The simulator still preprocesses each input on every Simulator.run(...) call.

Included from `sv`

Statevector simulator package surface
Cython sv_sim kernels
simulator benchmarks and graph assets
statevector tests/perf tests

Additional evolved changes

Optional numba-jitted helpers for small rotation/diagonal utilities
Module-scope matrix/phase constants
Hybrid pending-gate representation that keeps diagonal gates as phase pairs
Direct diagonal-kernel flush for diagonal fused gates
Preallocated packed instruction arrays instead of Python list growth
Conservative instruction capacity bound for swap expansion + pending flushes
SWAP decomposition through the optimized controlled-X path
Preserve diagonal pending gates across CZ/CRZ when they commute
simulation = ["numba>=0.59"] optional extra; fallback shim keeps numba optional

Benchmark artifacts updated

The existing default benchmark plots were regenerated from this PR branch with the improved simulator:

benchmarks/bench_random.png
benchmarks/bench_qft.png

The benchmarks folder also includes explicit no-cache before/after comparison artifacts:

benchmarks/bench_evolved_no_cache.json
benchmarks/bench_random_evolved_no_cache.png
benchmarks/bench_qft_evolved_no_cache.png

The comparison artifacts use pre-unrolled QasmModule inputs and intentionally do not include the repeated-QASM cache path.

Validation

Performed locally against the installed sv branch kernels from the AlphaEvolve workspace:

random circuit correctness vs current PyQASM statevector simulator
QFT circuit correctness vs current PyQASM statevector simulator
explicit CRZ correctness check
raw QASM string path and QasmModule path
shots > 0 smoke test
50 all-active random fuzz circuits at atol=1e-10

Regenerated default benchmark plots with:

NUMBA_DISABLE_CACHE=1 PYQASM_NUM_THREADS=1 .venv-bench/bin/python benchmarks/plot_benchmarks.py

No-cache comparison artifacts (PYQASM_NUM_THREADS=1, median of 5):

random configs geomean: ~1.15x
QFT configs geomean: ~1.31x
all benchmarked circuits: correct at atol=1e-10

AlphaEvolve full overlay results from the experiment showed the graph-compatible non-cache path at ~1.23x random and ~1.36x QFT geomean; this PR's committed benchmark artifacts were regenerated from the PR branch implementation.

…g, OpenMP support Key changes: - Remove unroll() from Simulator.run() for QasmModule input (caller responsibility) - Single-pass _preprocess() with inline gate fusion (replaces separate _fuse_gates pass) - Native gate handling in _preprocess() without requiring full AST unrolling - AST expression evaluator for raw QASM parameter expressions (pi, tau, arithmetic) - Cache unroll() results in QasmModule (skip if already unrolled) - New Cython apply_circuit() entry point with 5 specialized kernels - Optional OpenMP parallelism via PYQASM_NUM_THREADS env var (default=1/serial) - OpenMP build detection in setup.py (macOS Homebrew libomp + Linux) - Benchmark suite comparing PyQASM vs Qiskit Aer, Cirq, PennyLane Lightning - Performance regression tests with pytest benchmark marker - Profiling script for detailed per-phase timing breakdown Performance (serial, median of 5): - Random circuits: PyQASM fastest at all sizes (4-22q), beating Qiskit/Cirq/Lightning - QFT circuits: PyQASM fastest at ≤12q; kernel-bound at ≥16q (memory bandwidth limited) - With PYQASM_NUM_THREADS=8: 4.1x speedup at 22q, 2.7x at 20q Known limitation: macOS libomp conflicts with Qiskit Aer's OpenMP in same process, so parallelism is opt-in. Future work: SIMD/AVX kernels for single-threaded perf. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Port the non-cache Run 06 AlphaEvolve statevector improvements into the sv branch implementation while deliberately excluding the QASM-string instruction memoization that overfit the AE repeated-timing harness. The simulator still preprocesses every input on each run call. The speedup comes from reducing Python overhead in that preprocessing path: - add optional numba-jitted helpers for small rotation/diagonal utilities - hoist matrix constants and phase constants to module scope - keep pending diagonal one-qubit gates as phase pairs during fusion - flush diagonal gates directly to the diagonal Cython kernel - preallocate packed instruction arrays instead of growing Python lists - decompose SWAP through the optimized controlled-X kernel path - preserve diagonal pending gates across CZ/CRZ because they commute - keep the conservative capacity bound needed for swap expansion and pending-gate flushes This is the honest non-cache improvement: no _PROGRAM_INSTRUCTION_CACHE, no cross-call memoization, and no reliance on repeated identical QASM strings. Add a simulation extra for installing numba; the code falls back to pure Python helpers when numba is absent.

coderabbitai · 2026-05-26T14:37:07Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ca2ac12f-f68c-447b-b7a7-38ea8360e94c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces a complete statevector simulator for PyQASM with Cython/OpenMP-accelerated gate kernels, a high-level NumPy-based driver, correctness tests against Qiskit Aer, performance regression tests, and comprehensive benchmarking/profiling tools for cross-simulator comparison.

Changes

Statevector Simulator

Layer / File(s)	Summary
Build configuration and extension setup `setup.py`, `pyproject.toml`, `.gitignore`	Configures Cython extension building with OpenMP detection; defines a `simulation` optional dependency; adds gitignore for generated C files.
Accelerated statevector kernels `src/pyqasm/accelerate/sv_sim.pyx`	Low-level Cython/OpenMP gate application routines for single-qubit, controlled, diagonal phase, and two-qubit gates with optional parallel execution via `prange` and batch circuit dispatcher.
Python simulator layer with preprocessing `src/pyqasm/simulator/statevector.py`, `src/pyqasm/modules/base.py`	Gate matrix definitions, AST expression evaluation, circuit preprocessing with single-qubit gate fusion and SWAP decomposition, `SimulatorResult` dataclass, and `Simulator` class. Adds unroll caching optimization.
Simulator package exports `src/pyqasm/simulator/__init__.py`	Public re-export of `Simulator` and `SimulatorResult` from statevector module.
Correctness validation tests `tests/test_sv_sim.py`	Parameterized tests comparing PyQASM statevector results against Qiskit Aer with global phase normalization and numerical tolerance handling.
Performance regression tests `tests/test_perf_regression.py`	Benchmarks measuring median runtime for random and QFT circuits at various qubit/depth configurations; asserts performance stays within budgets and scaling is reasonable.
Benchmarking and profiling tools `benchmarks/bench_simulator.py`, `benchmarks/plot_benchmarks.py`, `benchmarks/profile_simulator.py`	Cross-simulator comparison suite (PyQASM vs Qiskit Aer, Cirq, PennyLane Lightning) with endian-aware correctness checks and matplotlib visualization. Profiling tools support cProfile, wall-clock section timing, and per-opcode kernel timing via CLI.

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title "Optimize statevector simulator preprocessing" directly reflects the main objective stated in the PR description: porting preprocessing improvements into the statevector simulator implementation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kenny/evolved-sv-preprocess-no-cache

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@benchmarks/bench_simulator.py`:
- Around line 344-348: The equality checks in the checks dict compare raw
statevectors (sv_pyqasm vs sv_qiskit, sv_cirq_le, sv_pl_le) and will fail when
candidates differ only by a global phase; normalize each candidate to
sv_pyqasm's global phase before calling np.allclose. For each candidate vector
(sv_qiskit, sv_cirq_le, sv_pl_le) compute a phase factor against sv_pyqasm
(e.g., using the normalized inner product np.vdot(sv_pyqasm,
sv_candidate)/abs(...), or by matching a nonzero component) and multiply/divide
the candidate by the conjugate of that phase so its global phase matches
sv_pyqasm, then use np.allclose on the aligned vectors when constructing the
checks dict.

In `@setup.py`:
- Around line 13-14: BASE_COMPILE_ARGS and BASE_LINK_ARGS are currently
hardcoded for GCC/Clang and include host-specific ISA flags; update setup.py to
make flags toolchain-aware by removing "-march=native" (and likely
"-ffast-math") from unconditional defaults and set compile/link flags based on
detected compiler type: inspect the build compiler via the distutils/setuptools
compiler instance or compiler.compiler_type and, for 'msvc', use
MSVC-appropriate flags (e.g. '/O2' and '/openmp' when OpenMP is available) and
for 'unix' use '-O3' and '-fopenmp' only when detected; change _check_openmp()
to probe OpenMP support using the project compiler (invoke
compiler.compile/compiler.link methods or use a temporary Extension building
attempt) rather than hardcoding cmd = ["cc", ...], return correct compile and
link flags to apply to extra_compile_args and extra_link_args for each Extension
(refer to BASE_COMPILE_ARGS, BASE_LINK_ARGS, extra_compile_args, and
_check_openmp in the diff).

In `@src/pyqasm/accelerate/sv_sim.pyx`:
- Around line 351-369: In apply_circuit, avoid dereferencing &array[0] on
possibly-empty Cython buffers; first check the lengths (e.g., ensure sv,
gate_params, diag_phases, two_qubit_gates have >0 elements or that
n_instructions>0 where appropriate) and only set sv_ptr, gp_ptr, dp_ptr, tq_ptr
to &array[0] when the buffer is non-empty, otherwise set the pointer to NULL (or
skip pointer usage) to prevent invalid memory access; update uses of
sv_ptr/gp_ptr/dp_ptr/tq_ptr later in the function to handle NULL/empty cases
safely.
- Around line 22-23: sv_sim.pyx currently cimports omp_get_max_threads and
omp_set_num_threads and calls omp_set_num_threads at module import time which
forces an OpenMP dependency even when setup.py disables it; modify the file to
guard all OpenMP cimports and calls behind a compile-time macro (e.g. `#IFDEF`
HAVE_OPENMP) driven by the same check used in setup.py so the Cython module
builds without OpenMP: wrap the cimports omp_get_max_threads/omp_set_num_threads
and any omp_* calls in conditional compilation blocks and provide safe no-op
fallbacks when the macro is not defined; additionally harden apply_circuit by
adding precondition checks (or early returns) for empty/mismatched memoryviews
(the function apply_circuit and its caller _preprocess/statevector code paths)
so it never takes &sv[0], &gate_params[0], &diag_phases[0], or
&two_qubit_gates[0] when the buffers are empty.

In `@tests/test_sv_sim.py`:
- Around line 309-314: The current comparison in tests/test_sv_sim.py only
normalizes global phase when the string "global phase" appears in description,
which is brittle; change the logic in the block that references description,
sv_expected, sv_actual, idx, and phase so that the global-phase normalization is
applied unconditionally (or replace it with a phase-invariant comparator helper)
by removing the conditional check on description and always computing idx =
np.argmax(np.abs(sv_expected) > 1e-10), phase = sv_actual[idx] /
sv_expected[idx], and sv_actual = sv_actual / phase before comparing vectors.
- Around line 44-45: Change the pytest.mark.parametrize argnames from a
comma-separated string to an explicit tuple so it satisfies PT006; locate the
pytest.mark.parametrize call in tests/test_sv_sim.py that currently uses "qasm,
description" and replace the argnames with a tuple form like ("qasm",
"description") (or equivalent single-quoted tuple) while keeping the existing
parameter values and test name intact.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a06ded4b-859c-4557-9d06-5128c623731b

📥 Commits

Reviewing files that changed from the base of the PR and between abefaf3 and b7bbe46.

⛔ Files ignored due to path filters (2)

benchmarks/bench_qft.png is excluded by !**/*.png
benchmarks/bench_random.png is excluded by !**/*.png

📒 Files selected for processing (12)

.gitignore
benchmarks/bench_simulator.py
benchmarks/plot_benchmarks.py
benchmarks/profile_simulator.py
pyproject.toml
setup.py
src/pyqasm/accelerate/sv_sim.pyx
src/pyqasm/modules/base.py
src/pyqasm/simulator/__init__.py
src/pyqasm/simulator/statevector.py
tests/test_perf_regression.py
tests/test_sv_sim.py

coderabbitai · 2026-05-26T14:48:02Z

+        checks = {
+            "qiskit": np.allclose(sv_pyqasm, sv_qiskit, atol=1e-10),
+            "cirq": np.allclose(sv_pyqasm, sv_cirq_le, atol=1e-10),
+            "lightning": np.allclose(sv_pyqasm, sv_pl_le, atol=1e-10),
+        }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize global phase before statevector equality checks.

Line 345–Line 347 compare raw vectors directly; equivalent states with different global phase will be reported as FAIL. Align each candidate vector to the PyQASM reference phase before np.allclose.

🔧 Proposed fix

+def _align_global_phase(reference, candidate): + """Phase-align candidate statevector to reference.""" + idx = int(np.argmax(np.abs(reference))) + if np.abs(reference[idx]) == 0 or np.abs(candidate[idx]) == 0: + return candidate + phase = np.angle(candidate[idx]) - np.angle(reference[idx]) + return candidate * np.exp(-1j * phase) + @@ - checks = { - "qiskit": np.allclose(sv_pyqasm, sv_qiskit, atol=1e-10), - "cirq": np.allclose(sv_pyqasm, sv_cirq_le, atol=1e-10), - "lightning": np.allclose(sv_pyqasm, sv_pl_le, atol=1e-10), - } + sv_qiskit_aligned = _align_global_phase(sv_pyqasm, sv_qiskit) + sv_cirq_aligned = _align_global_phase(sv_pyqasm, sv_cirq_le) + sv_pl_aligned = _align_global_phase(sv_pyqasm, sv_pl_le) + checks = { + "qiskit": np.allclose(sv_pyqasm, sv_qiskit_aligned, atol=1e-10), + "cirq": np.allclose(sv_pyqasm, sv_cirq_aligned, atol=1e-10), + "lightning": np.allclose(sv_pyqasm, sv_pl_aligned, atol=1e-10), + }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@benchmarks/bench_simulator.py` around lines 344 - 348, The equality checks in the checks dict compare raw statevectors (sv_pyqasm vs sv_qiskit, sv_cirq_le, sv_pl_le) and will fail when candidates differ only by a global phase; normalize each candidate to sv_pyqasm's global phase before calling np.allclose. For each candidate vector (sv_qiskit, sv_cirq_le, sv_pl_le) compute a phase factor against sv_pyqasm (e.g., using the normalized inner product np.vdot(sv_pyqasm, sv_candidate)/abs(...), or by matching a nonzero component) and multiply/divide the candidate by the conjugate of that phase so its global phase matches sv_pyqasm, then use np.allclose on the aligned vectors when constructing the checks dict.

coderabbitai · 2026-05-26T14:48:03Z

+BASE_COMPILE_ARGS = ["-O3", "-ffast-math", "-march=native"]
+BASE_LINK_ARGS = []


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify declared OS support and compiler assumptions in build script. rg -n 'Operating System :: Microsoft :: Windows|Operating System :: MacOS|Operating System :: POSIX' pyproject.toml rg -n 'BASE_COMPILE_ARGS|cc"|fopenmp|march=native|ffast-math' setup.py

Repository: qBraid/pyqasm

Length of output: 732

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show relevant parts of setup.py around BASE_COMPILE_ARGS and the compile invocation sed -n '1,140p' setup.py | nl -ba | sed -n '1,140p' # Also inspect further if any later use of march/native or cc exists rg -n 'march=native|ffast-math|\bcc\b|fopenmp|extra_compile_args|extra_link_args|setup\(' setup.py

Repository: qBraid/pyqasm

Length of output: 100

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect surrounding code in setup.py where BASE_COMPILE_ARGS and the compiler invocation are used sed -n '1,160p' setup.py | nl -ba # Locate all occurrences related to the flagged portability concerns rg -n 'march=native|ffast-math|\bcc\b|fopenmp|extra_compile_args|extra_link_args|setup\(' setup.py

Repository: qBraid/pyqasm

Length of output: 100

🏁 Script executed:

#!/bin/bash set -euo pipefail # Show relevant parts with line numbers (avoid nl) sed -n '1,200p' setup.py | awk '{printf "%4d:%s\n", NR, $0}' echo "---- occurrences ----" rg -n 'march=native|ffast-math|\bcc\b|fopenmp|extra_compile_args|extra_link_args|setup\(' setup.py

Repository: qBraid/pyqasm

Length of output: 3836

🏁 Script executed:

#!/bin/bash set -euo pipefail # Print relevant sections of pyproject.toml (classifiers + build-system) rg -n '^\[project\]|\[build-system\]|Operating System ::|name\s*=|requires-python' pyproject.toml echo "---- pyproject.toml excerpt ----" sed -n '1,220p' pyproject.toml

Repository: qBraid/pyqasm

Length of output: 3420

Make extension compile flags toolchain-aware and remove host-specific ISA defaults.

setup.py unconditionally uses GCC/Clang-specific flags: BASE_COMPILE_ARGS = ["-O3", "-ffast-math", "-march=native"], and applies them to both Cython extensions via extra_compile_args (may fail on MSVC and makes wheels host-ISA dependent).

setup.py OpenMP detection hardcodes cmd = ["cc"] ... -fopenmp in _check_openmp(); it’s not MSVC-aware, so Windows toolchains won’t reliably detect/link OpenMP.

Proposed direction

-BASE_COMPILE_ARGS = ["-O3", "-ffast-math", "-march=native"] +BASE_COMPILE_ARGS = ["-O3"] +# Consider opt-in env toggle for local native tuning: +# if os.environ.get("PYQASM_NATIVE_OPT", "0") == "1": add "-march=native" (non-wheel/local builds only) - cmd = ["cc"] + cflags + ldflags + [f.name, "-o", f.name + ".out"] + # Resolve compiler via sysconfig for active toolchain instead of hardcoded 'cc' + # and branch flags by compiler family (MSVC vs GCC/Clang).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@setup.py` around lines 13 - 14, BASE_COMPILE_ARGS and BASE_LINK_ARGS are currently hardcoded for GCC/Clang and include host-specific ISA flags; update setup.py to make flags toolchain-aware by removing "-march=native" (and likely "-ffast-math") from unconditional defaults and set compile/link flags based on detected compiler type: inspect the build compiler via the distutils/setuptools compiler instance or compiler.compiler_type and, for 'msvc', use MSVC-appropriate flags (e.g. '/O2' and '/openmp' when OpenMP is available) and for 'unix' use '-O3' and '-fopenmp' only when detected; change _check_openmp() to probe OpenMP support using the project compiler (invoke compiler.compile/compiler.link methods or use a temporary Extension building attempt) rather than hardcoding cmd = ["cc", ...], return correct compile and link flags to apply to extra_compile_args and extra_link_args for each Extension (refer to BASE_COMPILE_ARGS, BASE_LINK_ARGS, extra_compile_args, and _check_openmp in the diff).

coderabbitai · 2026-05-26T14:48:03Z

+from openmp cimport omp_get_max_threads
+


⚠️ Potential issue | 🔴 Critical | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify mismatch between setup fallback and unconditional OpenMP usage. rg -n 'openmp cimport|omp_get_max_threads|omp_set_num_threads' src/pyqasm/accelerate/sv_sim.pyx rg -n '_check_openmp|return \[\], \[\]|extra_compile_args=sv_sim_compile_args|extra_link_args=sv_sim_link_args' setup.py

Repository: qBraid/pyqasm

Length of output: 491

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== sv_sim.pyx: OpenMP-related code (top + functions around _check_openmp) ===" python3 - <<'PY' from pathlib import Path p = Path("src/pyqasm/accelerate/sv_sim.pyx") lines = p.read_text().splitlines() for i in range(1, 130): # 1-based line numbers if i <= len(lines): print(f"{i:4d}:{lines[i-1]}") PY echo "=== sv_sim.pyx: locate apply_circuit and show around pointer usage ===" rg -n "def apply_circuit|apply_circuit\(" -S src/pyqasm/accelerate/sv_sim.pyx python3 - <<'PY' from pathlib import Path p = Path("src/pyqasm/accelerate/sv_sim.pyx") text = p.read_text().splitlines() # Find first occurrence near suspected lines 330-430 start, end = 330, 430 for i in range(start, end+1): if 1 <= i <= len(text): print(f"{i:4d}:{text[i-1]}") PY echo "=== sv_sim.pyx: search for boundscheck/wraparound directives ===" rg -n "boundscheck|wraparound|initializedcheck|nonecheck|cdivision|language_level|cython:" src/pyqasm/accelerate/sv_sim.pyx echo "=== setup.py: OpenMP probe and sv_sim compile/link args ===" rg -n "_check_openmp|sv_sim_compile_args|sv_sim_link_args|extra_compile_args=sv_sim_compile_args|extra_link_args=sv_sim_link_args|omp_cflags|omp_ldflags" setup.py python3 - <<'PY' from pathlib import Path p = Path("setup.py") lines = p.read_text().splitlines() # Print a focused region around the openmp probe and sv_sim Extension config if present # Use a heuristic: print lines 1..220 and also 220..420 if file is longer. for a,b in [(1,220),(221,420),(421,700)]: if a-1 < len(lines): b = min(b, len(lines)) print(f"\n--- setup.py lines {a}-{b} ---") for i in range(a,b+1): print(f"{i:4d}:{lines[i-1]}") PY

Repository: qBraid/pyqasm

Length of output: 12472

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Find call sites for sv_sim.apply_circuit ===" rg -n "apply_circuit\s*\(" -S src | head -n 200 echo "=== Show likely wrapper/dispatcher that prepares arguments ===" rg -n "n_instructions|opcodes|targets|controls|two_qubit_offsets|two_qubit_gates|gate_params|diag_phases" -S src/pyqasm | head -n 200

Repository: qBraid/pyqasm

Length of output: 8543

🏁 Script executed:

#!/bin/bash set -euo pipefail python3 - <<'PY' from pathlib import Path p = Path("src/pyqasm/simulator/statevector.py") lines = p.read_text().splitlines() # Print around the apply_circuit call (based on rg line 579 from earlier output) for i in range(520, 640): # 1-based if 1 <= i <= len(lines): print(f"{i:4d}:{lines[i-1]}") PY echo "=== Also show where n_instructions is computed / passed (near tq_offsets variable) ===" rg -n "n\s*=|n_instructions|tq_offsets|tq_gates|n, opcodes|opcodes," -n src/pyqasm/simulator/statevector.py | head -n 80

Repository: qBraid/pyqasm

Length of output: 3453

🏁 Script executed:

#!/bin/bash set -euo pipefail python3 - <<'PY' from pathlib import Path p = Path("src/pyqasm/simulator/statevector.py") lines = p.read_text().splitlines() # Show around preprocess return logic for i in range(440, 540): # 1-based if 1 <= i <= len(lines): print(f"{i:4d}:{lines[i-1]}") PY echo "=== Also search for 'if n == 0' in _preprocess area ===" rg -n "def _preprocess|if n == 0|instruction_idx\s*=" src/pyqasm/simulator/statevector.py

Repository: qBraid/pyqasm

Length of output: 4498

🏁 Script executed:

#!/bin/bash set -euo pipefail python3 - <<'PY' from pathlib import Path p = Path("src/pyqasm/simulator/statevector.py") lines = p.read_text().splitlines() # Print the preprocess allocation + write logic region for i in range(250, 560): # 1-based if 1 <= i <= len(lines): print(f"{i:4d}:{lines[i-1]}") PY

Repository: qBraid/pyqasm

Length of output: 13725

Fix OpenMP build contract: sv_sim.pyx unconditionally depends on OpenMP even when setup.py disables it

from openmp cimport omp_get_max_threads

Applies to src/pyqasm/accelerate/sv_sim.pyx (also 32-38): omp_set_num_threads(...) is also imported and called at module import time. Meanwhile, setup.py conditionally sets sv_sim_compile_args / sv_sim_link_args from _check_openmp(), and falls back to ([], []) when OpenMP isn’t available—so the extension can fail to compile/link on toolchains lacking OpenMP headers/libs. Fix by guarding the openmp cimports/usage behind a compile-time macro driven by _check_openmp().

apply_circuit initializes pointers via &sv[0], &gate_params[0], &diag_phases[0], &two_qubit_gates[0] with bounds checks disabled, but the current call path only invokes apply_circuit(...) when n > 0 (statevector.py), and _preprocess returns None buffers when n == 0. It also guarantees two_qubit_gates_trimmed is length ≥ 1 when there are no two-qubit ops. So the specific empty-memoryview UB doesn’t occur through this dispatcher, but apply_circuit remains unsafe if called with empty/mismatched buffers.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pyqasm/accelerate/sv_sim.pyx` around lines 22 - 23, sv_sim.pyx currently cimports omp_get_max_threads and omp_set_num_threads and calls omp_set_num_threads at module import time which forces an OpenMP dependency even when setup.py disables it; modify the file to guard all OpenMP cimports and calls behind a compile-time macro (e.g. `#IFDEF` HAVE_OPENMP) driven by the same check used in setup.py so the Cython module builds without OpenMP: wrap the cimports omp_get_max_threads/omp_set_num_threads and any omp_* calls in conditional compilation blocks and provide safe no-op fallbacks when the macro is not defined; additionally harden apply_circuit by adding precondition checks (or early returns) for empty/mismatched memoryviews (the function apply_circuit and its caller _preprocess/statevector code paths) so it never takes &sv[0], &gate_params[0], &diag_phases[0], or &two_qubit_gates[0] when the buffers are empty.

coderabbitai · 2026-05-26T14:48:03Z

+cpdef void apply_circuit(
+    double complex[::1] sv,
+    Py_ssize_t num_qubits,
+    int[::1] opcodes,
+    int[::1] targets,
+    int[::1] controls,
+    double complex[::1] gate_params,
+    double complex[::1] diag_phases,
+    int[::1] two_qubit_offsets,
+    double complex[::1] two_qubit_gates,
+    Py_ssize_t n_instructions,
+) noexcept:
+    """Execute an entire circuit. Each gate releases the GIL internally."""
+    cdef Py_ssize_t i, gp_offset, dp_offset, tq_offset
+    cdef int op, tgt, ctrl
+    cdef double complex* sv_ptr = &sv[0]
+    cdef double complex* gp_ptr = &gate_params[0]
+    cdef double complex* dp_ptr = &diag_phases[0]
+    cdef double complex* tq_ptr = &two_qubit_gates[0]


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard array sizes before taking raw pointers in apply_circuit.

Lines 366–369 dereference &... [0] before validating lengths; with empty buffers (including n_instructions == 0) this is unsafe and can crash due to disabled bounds checks.

Suggested fix

cpdef void apply_circuit( @@ ) noexcept: @@ - cdef Py_ssize_t i, gp_offset, dp_offset, tq_offset + cdef Py_ssize_t i, gp_offset, dp_offset, tq_offset cdef int op, tgt, ctrl - cdef double complex* sv_ptr = &sv[0] - cdef double complex* gp_ptr = &gate_params[0] - cdef double complex* dp_ptr = &diag_phases[0] - cdef double complex* tq_ptr = &two_qubit_gates[0] + cdef double complex* sv_ptr + cdef double complex* gp_ptr + cdef double complex* dp_ptr + cdef double complex* tq_ptr + + if n_instructions <= 0: + return + if opcodes.shape[0] < n_instructions or targets.shape[0] < n_instructions or controls.shape[0] < n_instructions: + raise ValueError("instruction arrays shorter than n_instructions") + if gate_params.shape[0] < n_instructions * 4 or diag_phases.shape[0] < n_instructions * 2: + raise ValueError("parameter arrays too short for n_instructions") + if two_qubit_offsets.shape[0] < n_instructions: + raise ValueError("two_qubit_offsets shorter than n_instructions") + + sv_ptr = &sv[0] + gp_ptr = &gate_params[0] + dp_ptr = &diag_phases[0] + tq_ptr = &two_qubit_gates[0]

Also applies to: 371-405

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/pyqasm/accelerate/sv_sim.pyx` around lines 351 - 369, In apply_circuit, avoid dereferencing &array[0] on possibly-empty Cython buffers; first check the lengths (e.g., ensure sv, gate_params, diag_phases, two_qubit_gates have >0 elements or that n_instructions>0 where appropriate) and only set sv_ptr, gp_ptr, dp_ptr, tq_ptr to &array[0] when the buffer is non-empty, otherwise set the pointer to NULL (or skip pointer usage) to prevent invalid memory access; update uses of sv_ptr/gp_ptr/dp_ptr/tq_ptr later in the function to handle NULL/empty cases safely.

coderabbitai · 2026-05-26T14:48:03Z

+        # Cache: skip if already unrolled (no kwargs means default unroll)
+        if not kwargs and len(self._unrolled_ast.statements) > 0:
+            return


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Invalidate cached unroll when source statements changed.

This early return can skip required re-unrolls and reuse stale _unrolled_ast after mutating operations that update _statements/original_program (for example, methods that append/remove statements without resetting _unrolled_ast). That can produce incorrect downstream behavior.

Suggested fix

- # Cache: skip if already unrolled (no kwargs means default unroll) - if not kwargs and len(self._unrolled_ast.statements) > 0: + # Cache: skip only if already unrolled and source statements did not change. + if ( + not kwargs + and len(self._unrolled_ast.statements) > 0 + and self._unrolled_ast.statements is self._statements + ): return

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Cache: skip if already unrolled (no kwargs means default unroll)

if not kwargs and len(self._unrolled_ast.statements) > 0:

return

# Cache: skip only if already unrolled and source statements did not change.

if (

not kwargs

and len(self._unrolled_ast.statements) > 0

and self._unrolled_ast.statements is self._statements

):

return

coderabbitai · 2026-05-26T14:48:03Z

+    "qasm, description",
+    [


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify current usage and confirm PT006-relevant callsite rg -n -C2 '`@pytest`\.mark\.parametrize\(' tests/test_sv_sim.py

Repository: qBraid/pyqasm

Length of output: 131

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find ruff/flake8 config enabling PT006 (or general pytest style rules) rg -n --hidden --no-ignore-vcs "PT006" -S . || true rg -n --hidden --no-ignore-vcs "(pytest.*PT|PT006|ruff.*pytest|flake8.*pytest)" -S pyproject.toml ruff.toml setup.cfg tox.ini .flake8 . || true # Confirm whether there are other parametrize callsites in this file rg -n "pytest\.mark\.parametrize" tests/test_sv_sim.py

Repository: qBraid/pyqasm

Length of output: 340

🏁 Script executed:

#!/bin/bash set -euo pipefail # Inspect ruff config around flake8-pytest-style and any PT006 selection/ignores sed -n '1,120p' ruff.toml echo "----" rg -n "PT006|flake8-pytest-style|pytest-style" ruff.toml echo "----" # Show any pytest-style lint tables that might define enable/disable behavior rg -n "flake8-pytest-style" -n ruff.toml && true

Repository: qBraid/pyqasm

Length of output: 2646

Switch pytest.mark.parametrize argnames to tuple form (PT006).

Ruff enables flake8-pytest-style via select = ["PT", ...] and does not ignore PT006, while tests/test_sv_sim.py currently uses a comma-separated string argnames ("qasm, description"), which trips the rule. Change to:

Suggested change

-@pytest.mark.parametrize( - "qasm, description", +@pytest.mark.parametrize( + ("qasm", "description"),

🧰 Tools

🪛 Ruff (0.15.14)

[warning] 44-44: Wrong type passed to first argument of pytest.mark.parametrize; expected tuple

Use a tuple for the first argument

(PT006)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_sv_sim.py` around lines 44 - 45, Change the pytest.mark.parametrize argnames from a comma-separated string to an explicit tuple so it satisfies PT006; locate the pytest.mark.parametrize call in tests/test_sv_sim.py that currently uses "qasm, description" and replace the argnames with a tuple form like ("qasm", "description") (or equivalent single-quoted tuple) while keeping the existing parameter values and test name intact.

coderabbitai · 2026-05-26T14:48:03Z

+    if "global phase" in description:
+        # Compare up to global phase: find first nonzero element and normalize
+        idx = np.argmax(np.abs(sv_expected) > 1e-10)
+        phase = sv_actual[idx] / sv_expected[idx]
+        sv_actual = sv_actual / phase
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Global-phase normalization should not depend on description text.

Conditioning on "global phase" in description is brittle and can miss valid equivalent states in other cases. Normalize phase unconditionally (or use a phase-invariant comparator helper).

Suggested change

- if "global phase" in description: - # Compare up to global phase: find first nonzero element and normalize - idx = np.argmax(np.abs(sv_expected) > 1e-10) - phase = sv_actual[idx] / sv_expected[idx] - sv_actual = sv_actual / phase + # Compare up to global phase for all cases + idx = np.argmax(np.abs(sv_expected) > 1e-10) + phase = sv_actual[idx] / sv_expected[idx] + sv_actual = sv_actual / phase

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_sv_sim.py` around lines 309 - 314, The current comparison in tests/test_sv_sim.py only normalizes global phase when the string "global phase" appears in description, which is brittle; change the logic in the block that references description, sv_expected, sv_actual, idx, and phase so that the global-phase normalization is applied unconditionally (or replace it with a phase-invariant comparator helper) by removing the conditional check on description and always computing idx = np.argmax(np.abs(sv_expected) > 1e-10), phase = sv_actual[idx] / sv_expected[idx], and sv_actual = sv_actual / phase before comparing vectors.

Two independent CI failures are addressed: 1. Source-distribution job (6 test failures): remove the unroll() caching shortcut in QasmModule that early-returned when the module looked already unrolled. It broke re-unrolling (e.g. unroll(external_gates=...) followed by unroll() to flush them), failing the depth/rebase/include tests. This also contradicted the PR's stated "preprocess on every run" goal. 2. Wheel-build jobs (qiskit unbuildable in manylinux containers): qiskit / qiskit-aer have no prebuilt wheels for the cibuildwheel containers and fall back to a from-source build needing Rust. Move those reference-oracle deps into a dedicated `test-sim` extra so the wheel test env (test,cli,pulse) no longer pulls them, and guard the qiskit imports in test_sv_sim.py with pytest.importorskip so the module skips cleanly where qiskit is absent. The sdist and coverage jobs install `test-sim` to keep running the comparison on the standard runners where qiskit installs fine. Also exclude the hardware-dependent benchmark perf-regression tests from the CI test runs (-m "not benchmark"), per tests/test_perf_regression.py's own docs.

The build appended -O3 -ffast-math -march=native unconditionally to every Extension, regardless of compiler or architecture. This is wrong for distributed wheels: * -march=native ties the wheel to the build machine's CPU and can SIGILL on older hardware; it is also not the right spelling on MSVC or arm64. * MSVC silently ignores the GCC/Clang flags, so Windows wheels got none of the intended optimization. * -ffast-math relaxes IEEE semantics, which the statevector simulator relies on. Replace the static flag list with a build_ext subclass that selects flags per compiler: -O3 for GCC/Clang, /O2 for MSVC, and OpenMP (-fopenmp / /openmp) applied only to the sv_sim kernel. No -march (baseline ISA, matching SciPy and scikit-learn) and no -ffast-math.

The macOS wheel jobs failed in delocate: the sv_sim OpenMP build links Homebrew's libomp, whose dylib has a minimum-macOS target newer than the wheel's macosx_11_0 tag, so delocate refuses to bundle it. (Pre-existing in this PR's OpenMP-on-macOS approach.) Disable OpenMP on macOS by default so no libomp is linked or bundled and the wheels stay portable (sv_sim builds single-threaded on macOS only; Linux and Windows keep OpenMP). A local source build can opt back in with PYQASM_MACOS_OPENMP=1. Drop the now-unnecessary `brew install libomp` step from the wheel workflows.

Disabling the -fopenmp flag on macOS was not enough: sv_sim.pyx unconditionally cimports prange/openmp, so the generated C still pulled in an unguarded <omp.h> and failed to compile on macOS (no system libomp). Gate the OpenMP imports, thread setup, and every prange branch behind a Cython compile-time IF USE_OPENMP. The serial `with nogil` paths (which already existed alongside each prange) become the sole code path when OpenMP is off, and no <omp.h> is emitted. setup.py decides USE_OPENMP once (Linux/Windows on, macOS off unless PYQASM_MACOS_OPENMP=1) and passes it to cythonize via compile_time_env, matching the compiler flags applied in build_ext. Verified: Linux/Windows OpenMP build and the serial macOS build are bit-identical, and the threaded prange path matches the serial path.

codecov-commenter · 2026-06-07T03:28:06Z

Codecov Report

❌ Patch coverage is 95.39474% with 14 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/pyqasm/simulator/statevector.py	95.37%	14 Missing ⚠️

📢 Thoughts on this report? Let us know!

The simulator's crz fast path used a controlled-diagonal instruction that can only phase the |control=1, target=1> amplitude, so it implemented a controlled-phase gate diag(1,1,1,e^{i*theta/2}) instead of a true controlled-Rz diag(1,1,e^{-i*theta/2},e^{i*theta/2}) -- it dropped the phase on |control=1, target=0>. Route crz through the general controlled-gate kernel with the full Rz matrix instead. Add tests/test_sv_sim_core.py: a qiskit-free test module (the existing simulator tests importorskip qiskit and are skipped in wheel-build containers, and never exercised crz). It regression-tests crz via a self-contained differential against pyqasm's own decomposition, plus known-action gate checks, fusion paths, error handling, and helper unit tests. Raises statevector.py coverage from 78% to 95%; the remainder is dead/defensive code (notably the never-called generic two-qubit path). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ryanhill1 · 2026-06-08T21:42:45Z

Follow-up: dead generic two-qubit-gate scaffolding

While adding coverage/tests for the statevector simulator I noticed the generic two-qubit path is fully built but never reached, so flagging it here so we don't forget to either wire it up or remove it:

src/pyqasm/simulator/statevector.py
- _write_two_qubit (defined at statevector.py:349) has no callers — _preprocess only emits _OP_SINGLE/_OP_CONTROLLED/_OP_DIAGONAL/_OP_CTRL_DIAGONAL (swap is decomposed into 3 CX, cz/crz/cx/cy go through controlled paths).
- Because of that, tq_offset is always 0, so the if tq_offset > 0: branch (statevector.py:500) and the _OP_TWO_QUBIT opcode (:224) are dead.
src/pyqasm/accelerate/sv_sim.pyx
- apply_two_qubit_gate / _apply_two_qubit_gate (sv_sim.pyx:340) and the OP_TWO_QUBIT dispatch arm are implemented but unused.

These lines are the bulk of the remaining uncovered code in statevector.py (now at 95%). Decision needed: either route a real generic 2-qubit gate through it (and add a test), or remove the Python+Cython scaffolding to keep the module lean.

Context: this is separate from the crz fast-path fix in 762dd67, which was implementing a controlled-phase gate (diag(1,1,1,e^{iθ/2})) instead of a true controlled-Rz (diag(1,1,e^{-iθ/2},e^{iθ/2})).

ryanhill1 and others added 12 commits January 16, 2025 22:21

dev

4f481f4

1 & 2 qubit gate handling

798d194

remove class attributes

2683290

Merge branch 'main' into sv

7bc59c2

add result class

ea3b1ff

check-in

13e76ae

check-in updates2

1135970

add test cases

a274785

get swap gate a little closer - still not right tho

5881c97

Merge branch 'main' into sv

064b922

Kenny-Heitritter requested a review from TheGupta2012 as a code owner May 26, 2026 14:36

Merge branch 'main' into kenny/evolved-sv-preprocess-no-cache

b84a23e

Kenny-Heitritter marked this pull request as draft May 26, 2026 14:44

Add evolved no-cache statevector benchmark artifacts

0cfd83a

coderabbitai Bot requested changes May 26, 2026

View reviewed changes

Kenny-Heitritter and others added 4 commits May 26, 2026 10:01

Regenerate statevector benchmark plots

5afa323

Rename bench_random_evolved_no_cache.png to bench_random_evolved.png

9af158f

Rename bench_qft_evolved_no_cache.png to bench_qft_evolved.png

e7879ce

Rename bench_evolved_no_cache.json to bench_evolved.json

f07ebcd

Kenny-Heitritter marked this pull request as ready for review May 26, 2026 15:05

Kenny-Heitritter and others added 5 commits May 26, 2026 10:22

Fix CI for statevector simulator PR

1d56076

		BASE_COMPILE_ARGS = ["-O3", "-ffast-math", "-march=native"]
		BASE_LINK_ARGS = []

Conversation

Kenny-Heitritter commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Included from sv

Additional evolved changes

Benchmark artifacts updated

Validation

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ryanhill1 commented Jun 8, 2026

Follow-up: dead generic two-qubit-gate scaffolding

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Kenny-Heitritter commented May 26, 2026 •

edited

Loading

Included from `sv`

coderabbitai Bot commented May 26, 2026 •

edited

Loading

codecov-commenter commented Jun 7, 2026 •

edited

Loading