Optimize statevector simulator preprocessing#316
Conversation
…g, OpenMP support Key changes: - Remove unroll() from Simulator.run() for QasmModule input (caller responsibility) - Single-pass _preprocess() with inline gate fusion (replaces separate _fuse_gates pass) - Native gate handling in _preprocess() without requiring full AST unrolling - AST expression evaluator for raw QASM parameter expressions (pi, tau, arithmetic) - Cache unroll() results in QasmModule (skip if already unrolled) - New Cython apply_circuit() entry point with 5 specialized kernels - Optional OpenMP parallelism via PYQASM_NUM_THREADS env var (default=1/serial) - OpenMP build detection in setup.py (macOS Homebrew libomp + Linux) - Benchmark suite comparing PyQASM vs Qiskit Aer, Cirq, PennyLane Lightning - Performance regression tests with pytest benchmark marker - Profiling script for detailed per-phase timing breakdown Performance (serial, median of 5): - Random circuits: PyQASM fastest at all sizes (4-22q), beating Qiskit/Cirq/Lightning - QFT circuits: PyQASM fastest at ≤12q; kernel-bound at ≥16q (memory bandwidth limited) - With PYQASM_NUM_THREADS=8: 4.1x speedup at 22q, 2.7x at 20q Known limitation: macOS libomp conflicts with Qiskit Aer's OpenMP in same process, so parallelism is opt-in. Future work: SIMD/AVX kernels for single-threaded perf. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Port the non-cache Run 06 AlphaEvolve statevector improvements into the sv branch implementation while deliberately excluding the QASM-string instruction memoization that overfit the AE repeated-timing harness. The simulator still preprocesses every input on each run call. The speedup comes from reducing Python overhead in that preprocessing path: - add optional numba-jitted helpers for small rotation/diagonal utilities - hoist matrix constants and phase constants to module scope - keep pending diagonal one-qubit gates as phase pairs during fusion - flush diagonal gates directly to the diagonal Cython kernel - preallocate packed instruction arrays instead of growing Python lists - decompose SWAP through the optimized controlled-X kernel path - preserve diagonal pending gates across CZ/CRZ because they commute - keep the conservative capacity bound needed for swap expansion and pending-gate flushes This is the honest non-cache improvement: no _PROGRAM_INSTRUCTION_CACHE, no cross-call memoization, and no reliance on repeated identical QASM strings. Add a simulation extra for installing numba; the code falls back to pure Python helpers when numba is absent.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR introduces a complete statevector simulator for PyQASM with Cython/OpenMP-accelerated gate kernels, a high-level NumPy-based driver, correctness tests against Qiskit Aer, performance regression tests, and comprehensive benchmarking/profiling tools for cross-simulator comparison. ChangesStatevector Simulator
🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@benchmarks/bench_simulator.py`:
- Around line 344-348: The equality checks in the checks dict compare raw
statevectors (sv_pyqasm vs sv_qiskit, sv_cirq_le, sv_pl_le) and will fail when
candidates differ only by a global phase; normalize each candidate to
sv_pyqasm's global phase before calling np.allclose. For each candidate vector
(sv_qiskit, sv_cirq_le, sv_pl_le) compute a phase factor against sv_pyqasm
(e.g., using the normalized inner product np.vdot(sv_pyqasm,
sv_candidate)/abs(...), or by matching a nonzero component) and multiply/divide
the candidate by the conjugate of that phase so its global phase matches
sv_pyqasm, then use np.allclose on the aligned vectors when constructing the
checks dict.
In `@setup.py`:
- Around line 13-14: BASE_COMPILE_ARGS and BASE_LINK_ARGS are currently
hardcoded for GCC/Clang and include host-specific ISA flags; update setup.py to
make flags toolchain-aware by removing "-march=native" (and likely
"-ffast-math") from unconditional defaults and set compile/link flags based on
detected compiler type: inspect the build compiler via the distutils/setuptools
compiler instance or compiler.compiler_type and, for 'msvc', use
MSVC-appropriate flags (e.g. '/O2' and '/openmp' when OpenMP is available) and
for 'unix' use '-O3' and '-fopenmp' only when detected; change _check_openmp()
to probe OpenMP support using the project compiler (invoke
compiler.compile/compiler.link methods or use a temporary Extension building
attempt) rather than hardcoding cmd = ["cc", ...], return correct compile and
link flags to apply to extra_compile_args and extra_link_args for each Extension
(refer to BASE_COMPILE_ARGS, BASE_LINK_ARGS, extra_compile_args, and
_check_openmp in the diff).
In `@src/pyqasm/accelerate/sv_sim.pyx`:
- Around line 351-369: In apply_circuit, avoid dereferencing &array[0] on
possibly-empty Cython buffers; first check the lengths (e.g., ensure sv,
gate_params, diag_phases, two_qubit_gates have >0 elements or that
n_instructions>0 where appropriate) and only set sv_ptr, gp_ptr, dp_ptr, tq_ptr
to &array[0] when the buffer is non-empty, otherwise set the pointer to NULL (or
skip pointer usage) to prevent invalid memory access; update uses of
sv_ptr/gp_ptr/dp_ptr/tq_ptr later in the function to handle NULL/empty cases
safely.
- Around line 22-23: sv_sim.pyx currently cimports omp_get_max_threads and
omp_set_num_threads and calls omp_set_num_threads at module import time which
forces an OpenMP dependency even when setup.py disables it; modify the file to
guard all OpenMP cimports and calls behind a compile-time macro (e.g. `#IFDEF`
HAVE_OPENMP) driven by the same check used in setup.py so the Cython module
builds without OpenMP: wrap the cimports omp_get_max_threads/omp_set_num_threads
and any omp_* calls in conditional compilation blocks and provide safe no-op
fallbacks when the macro is not defined; additionally harden apply_circuit by
adding precondition checks (or early returns) for empty/mismatched memoryviews
(the function apply_circuit and its caller _preprocess/statevector code paths)
so it never takes &sv[0], &gate_params[0], &diag_phases[0], or
&two_qubit_gates[0] when the buffers are empty.
In `@tests/test_sv_sim.py`:
- Around line 309-314: The current comparison in tests/test_sv_sim.py only
normalizes global phase when the string "global phase" appears in description,
which is brittle; change the logic in the block that references description,
sv_expected, sv_actual, idx, and phase so that the global-phase normalization is
applied unconditionally (or replace it with a phase-invariant comparator helper)
by removing the conditional check on description and always computing idx =
np.argmax(np.abs(sv_expected) > 1e-10), phase = sv_actual[idx] /
sv_expected[idx], and sv_actual = sv_actual / phase before comparing vectors.
- Around line 44-45: Change the pytest.mark.parametrize argnames from a
comma-separated string to an explicit tuple so it satisfies PT006; locate the
pytest.mark.parametrize call in tests/test_sv_sim.py that currently uses "qasm,
description" and replace the argnames with a tuple form like ("qasm",
"description") (or equivalent single-quoted tuple) while keeping the existing
parameter values and test name intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a06ded4b-859c-4557-9d06-5128c623731b
⛔ Files ignored due to path filters (2)
benchmarks/bench_qft.pngis excluded by!**/*.pngbenchmarks/bench_random.pngis excluded by!**/*.png
📒 Files selected for processing (12)
.gitignorebenchmarks/bench_simulator.pybenchmarks/plot_benchmarks.pybenchmarks/profile_simulator.pypyproject.tomlsetup.pysrc/pyqasm/accelerate/sv_sim.pyxsrc/pyqasm/modules/base.pysrc/pyqasm/simulator/__init__.pysrc/pyqasm/simulator/statevector.pytests/test_perf_regression.pytests/test_sv_sim.py
| checks = { | ||
| "qiskit": np.allclose(sv_pyqasm, sv_qiskit, atol=1e-10), | ||
| "cirq": np.allclose(sv_pyqasm, sv_cirq_le, atol=1e-10), | ||
| "lightning": np.allclose(sv_pyqasm, sv_pl_le, atol=1e-10), | ||
| } |
There was a problem hiding this comment.
Normalize global phase before statevector equality checks.
Line 345–Line 347 compare raw vectors directly; equivalent states with different global phase will be reported as FAIL. Align each candidate vector to the PyQASM reference phase before np.allclose.
🔧 Proposed fix
+def _align_global_phase(reference, candidate):
+ """Phase-align candidate statevector to reference."""
+ idx = int(np.argmax(np.abs(reference)))
+ if np.abs(reference[idx]) == 0 or np.abs(candidate[idx]) == 0:
+ return candidate
+ phase = np.angle(candidate[idx]) - np.angle(reference[idx])
+ return candidate * np.exp(-1j * phase)
+
@@
- checks = {
- "qiskit": np.allclose(sv_pyqasm, sv_qiskit, atol=1e-10),
- "cirq": np.allclose(sv_pyqasm, sv_cirq_le, atol=1e-10),
- "lightning": np.allclose(sv_pyqasm, sv_pl_le, atol=1e-10),
- }
+ sv_qiskit_aligned = _align_global_phase(sv_pyqasm, sv_qiskit)
+ sv_cirq_aligned = _align_global_phase(sv_pyqasm, sv_cirq_le)
+ sv_pl_aligned = _align_global_phase(sv_pyqasm, sv_pl_le)
+ checks = {
+ "qiskit": np.allclose(sv_pyqasm, sv_qiskit_aligned, atol=1e-10),
+ "cirq": np.allclose(sv_pyqasm, sv_cirq_aligned, atol=1e-10),
+ "lightning": np.allclose(sv_pyqasm, sv_pl_aligned, atol=1e-10),
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@benchmarks/bench_simulator.py` around lines 344 - 348, The equality checks in
the checks dict compare raw statevectors (sv_pyqasm vs sv_qiskit, sv_cirq_le,
sv_pl_le) and will fail when candidates differ only by a global phase; normalize
each candidate to sv_pyqasm's global phase before calling np.allclose. For each
candidate vector (sv_qiskit, sv_cirq_le, sv_pl_le) compute a phase factor
against sv_pyqasm (e.g., using the normalized inner product np.vdot(sv_pyqasm,
sv_candidate)/abs(...), or by matching a nonzero component) and multiply/divide
the candidate by the conjugate of that phase so its global phase matches
sv_pyqasm, then use np.allclose on the aligned vectors when constructing the
checks dict.
| BASE_COMPILE_ARGS = ["-O3", "-ffast-math", "-march=native"] | ||
| BASE_LINK_ARGS = [] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify declared OS support and compiler assumptions in build script.
rg -n 'Operating System :: Microsoft :: Windows|Operating System :: MacOS|Operating System :: POSIX' pyproject.toml
rg -n 'BASE_COMPILE_ARGS|cc"|fopenmp|march=native|ffast-math' setup.pyRepository: qBraid/pyqasm
Length of output: 732
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show relevant parts of setup.py around BASE_COMPILE_ARGS and the compile invocation
sed -n '1,140p' setup.py | nl -ba | sed -n '1,140p'
# Also inspect further if any later use of march/native or cc exists
rg -n 'march=native|ffast-math|\bcc\b|fopenmp|extra_compile_args|extra_link_args|setup\(' setup.pyRepository: qBraid/pyqasm
Length of output: 100
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect surrounding code in setup.py where BASE_COMPILE_ARGS and the compiler invocation are used
sed -n '1,160p' setup.py | nl -ba
# Locate all occurrences related to the flagged portability concerns
rg -n 'march=native|ffast-math|\bcc\b|fopenmp|extra_compile_args|extra_link_args|setup\(' setup.pyRepository: qBraid/pyqasm
Length of output: 100
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show relevant parts with line numbers (avoid nl)
sed -n '1,200p' setup.py | awk '{printf "%4d:%s\n", NR, $0}'
echo "---- occurrences ----"
rg -n 'march=native|ffast-math|\bcc\b|fopenmp|extra_compile_args|extra_link_args|setup\(' setup.pyRepository: qBraid/pyqasm
Length of output: 3836
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Print relevant sections of pyproject.toml (classifiers + build-system)
rg -n '^\[project\]|\[build-system\]|Operating System ::|name\s*=|requires-python' pyproject.toml
echo "---- pyproject.toml excerpt ----"
sed -n '1,220p' pyproject.tomlRepository: qBraid/pyqasm
Length of output: 3420
Make extension compile flags toolchain-aware and remove host-specific ISA defaults.
setup.pyunconditionally uses GCC/Clang-specific flags:BASE_COMPILE_ARGS = ["-O3", "-ffast-math", "-march=native"], and applies them to both Cython extensions viaextra_compile_args(may fail on MSVC and makes wheels host-ISA dependent).setup.pyOpenMP detection hardcodescmd = ["cc"] ... -fopenmpin_check_openmp(); it’s not MSVC-aware, so Windows toolchains won’t reliably detect/link OpenMP.
Proposed direction
-BASE_COMPILE_ARGS = ["-O3", "-ffast-math", "-march=native"]
+BASE_COMPILE_ARGS = ["-O3"]
+# Consider opt-in env toggle for local native tuning:
+# if os.environ.get("PYQASM_NATIVE_OPT", "0") == "1": add "-march=native" (non-wheel/local builds only)
- cmd = ["cc"] + cflags + ldflags + [f.name, "-o", f.name + ".out"]
+ # Resolve compiler via sysconfig for active toolchain instead of hardcoded 'cc'
+ # and branch flags by compiler family (MSVC vs GCC/Clang).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@setup.py` around lines 13 - 14, BASE_COMPILE_ARGS and BASE_LINK_ARGS are
currently hardcoded for GCC/Clang and include host-specific ISA flags; update
setup.py to make flags toolchain-aware by removing "-march=native" (and likely
"-ffast-math") from unconditional defaults and set compile/link flags based on
detected compiler type: inspect the build compiler via the distutils/setuptools
compiler instance or compiler.compiler_type and, for 'msvc', use
MSVC-appropriate flags (e.g. '/O2' and '/openmp' when OpenMP is available) and
for 'unix' use '-O3' and '-fopenmp' only when detected; change _check_openmp()
to probe OpenMP support using the project compiler (invoke
compiler.compile/compiler.link methods or use a temporary Extension building
attempt) rather than hardcoding cmd = ["cc", ...], return correct compile and
link flags to apply to extra_compile_args and extra_link_args for each Extension
(refer to BASE_COMPILE_ARGS, BASE_LINK_ARGS, extra_compile_args, and
_check_openmp in the diff).
| from openmp cimport omp_get_max_threads | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify mismatch between setup fallback and unconditional OpenMP usage.
rg -n 'openmp cimport|omp_get_max_threads|omp_set_num_threads' src/pyqasm/accelerate/sv_sim.pyx
rg -n '_check_openmp|return \[\], \[\]|extra_compile_args=sv_sim_compile_args|extra_link_args=sv_sim_link_args' setup.pyRepository: qBraid/pyqasm
Length of output: 491
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== sv_sim.pyx: OpenMP-related code (top + functions around _check_openmp) ==="
python3 - <<'PY'
from pathlib import Path
p = Path("src/pyqasm/accelerate/sv_sim.pyx")
lines = p.read_text().splitlines()
for i in range(1, 130): # 1-based line numbers
if i <= len(lines):
print(f"{i:4d}:{lines[i-1]}")
PY
echo "=== sv_sim.pyx: locate apply_circuit and show around pointer usage ==="
rg -n "def apply_circuit|apply_circuit\(" -S src/pyqasm/accelerate/sv_sim.pyx
python3 - <<'PY'
from pathlib import Path
p = Path("src/pyqasm/accelerate/sv_sim.pyx")
text = p.read_text().splitlines()
# Find first occurrence near suspected lines 330-430
start, end = 330, 430
for i in range(start, end+1):
if 1 <= i <= len(text):
print(f"{i:4d}:{text[i-1]}")
PY
echo "=== sv_sim.pyx: search for boundscheck/wraparound directives ==="
rg -n "boundscheck|wraparound|initializedcheck|nonecheck|cdivision|language_level|cython:" src/pyqasm/accelerate/sv_sim.pyx
echo "=== setup.py: OpenMP probe and sv_sim compile/link args ==="
rg -n "_check_openmp|sv_sim_compile_args|sv_sim_link_args|extra_compile_args=sv_sim_compile_args|extra_link_args=sv_sim_link_args|omp_cflags|omp_ldflags" setup.py
python3 - <<'PY'
from pathlib import Path
p = Path("setup.py")
lines = p.read_text().splitlines()
# Print a focused region around the openmp probe and sv_sim Extension config if present
# Use a heuristic: print lines 1..220 and also 220..420 if file is longer.
for a,b in [(1,220),(221,420),(421,700)]:
if a-1 < len(lines):
b = min(b, len(lines))
print(f"\n--- setup.py lines {a}-{b} ---")
for i in range(a,b+1):
print(f"{i:4d}:{lines[i-1]}")
PYRepository: qBraid/pyqasm
Length of output: 12472
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Find call sites for sv_sim.apply_circuit ==="
rg -n "apply_circuit\s*\(" -S src | head -n 200
echo "=== Show likely wrapper/dispatcher that prepares arguments ==="
rg -n "n_instructions|opcodes|targets|controls|two_qubit_offsets|two_qubit_gates|gate_params|diag_phases" -S src/pyqasm | head -n 200Repository: qBraid/pyqasm
Length of output: 8543
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
from pathlib import Path
p = Path("src/pyqasm/simulator/statevector.py")
lines = p.read_text().splitlines()
# Print around the apply_circuit call (based on rg line 579 from earlier output)
for i in range(520, 640): # 1-based
if 1 <= i <= len(lines):
print(f"{i:4d}:{lines[i-1]}")
PY
echo "=== Also show where n_instructions is computed / passed (near tq_offsets variable) ==="
rg -n "n\s*=|n_instructions|tq_offsets|tq_gates|n, opcodes|opcodes," -n src/pyqasm/simulator/statevector.py | head -n 80Repository: qBraid/pyqasm
Length of output: 3453
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
from pathlib import Path
p = Path("src/pyqasm/simulator/statevector.py")
lines = p.read_text().splitlines()
# Show around preprocess return logic
for i in range(440, 540): # 1-based
if 1 <= i <= len(lines):
print(f"{i:4d}:{lines[i-1]}")
PY
echo "=== Also search for 'if n == 0' in _preprocess area ==="
rg -n "def _preprocess|if n == 0|instruction_idx\s*=" src/pyqasm/simulator/statevector.pyRepository: qBraid/pyqasm
Length of output: 4498
🏁 Script executed:
#!/bin/bash
set -euo pipefail
python3 - <<'PY'
from pathlib import Path
p = Path("src/pyqasm/simulator/statevector.py")
lines = p.read_text().splitlines()
# Print the preprocess allocation + write logic region
for i in range(250, 560): # 1-based
if 1 <= i <= len(lines):
print(f"{i:4d}:{lines[i-1]}")
PYRepository: qBraid/pyqasm
Length of output: 13725
Fix OpenMP build contract: sv_sim.pyx unconditionally depends on OpenMP even when setup.py disables it
from openmp cimport omp_get_max_threadsApplies to src/pyqasm/accelerate/sv_sim.pyx (also 32-38): omp_set_num_threads(...) is also imported and called at module import time. Meanwhile, setup.py conditionally sets sv_sim_compile_args / sv_sim_link_args from _check_openmp(), and falls back to ([], []) when OpenMP isn’t available—so the extension can fail to compile/link on toolchains lacking OpenMP headers/libs. Fix by guarding the openmp cimports/usage behind a compile-time macro driven by _check_openmp().
apply_circuitinitializes pointers via&sv[0],&gate_params[0],&diag_phases[0],&two_qubit_gates[0]with bounds checks disabled, but the current call path only invokesapply_circuit(...)whenn > 0(statevector.py), and_preprocessreturnsNonebuffers whenn == 0. It also guaranteestwo_qubit_gates_trimmedis length ≥ 1 when there are no two-qubit ops. So the specific empty-memoryview UB doesn’t occur through this dispatcher, butapply_circuitremains unsafe if called with empty/mismatched buffers.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/pyqasm/accelerate/sv_sim.pyx` around lines 22 - 23, sv_sim.pyx currently
cimports omp_get_max_threads and omp_set_num_threads and calls
omp_set_num_threads at module import time which forces an OpenMP dependency even
when setup.py disables it; modify the file to guard all OpenMP cimports and
calls behind a compile-time macro (e.g. `#IFDEF` HAVE_OPENMP) driven by the same
check used in setup.py so the Cython module builds without OpenMP: wrap the
cimports omp_get_max_threads/omp_set_num_threads and any omp_* calls in
conditional compilation blocks and provide safe no-op fallbacks when the macro
is not defined; additionally harden apply_circuit by adding precondition checks
(or early returns) for empty/mismatched memoryviews (the function apply_circuit
and its caller _preprocess/statevector code paths) so it never takes &sv[0],
&gate_params[0], &diag_phases[0], or &two_qubit_gates[0] when the buffers are
empty.
| cpdef void apply_circuit( | ||
| double complex[::1] sv, | ||
| Py_ssize_t num_qubits, | ||
| int[::1] opcodes, | ||
| int[::1] targets, | ||
| int[::1] controls, | ||
| double complex[::1] gate_params, | ||
| double complex[::1] diag_phases, | ||
| int[::1] two_qubit_offsets, | ||
| double complex[::1] two_qubit_gates, | ||
| Py_ssize_t n_instructions, | ||
| ) noexcept: | ||
| """Execute an entire circuit. Each gate releases the GIL internally.""" | ||
| cdef Py_ssize_t i, gp_offset, dp_offset, tq_offset | ||
| cdef int op, tgt, ctrl | ||
| cdef double complex* sv_ptr = &sv[0] | ||
| cdef double complex* gp_ptr = &gate_params[0] | ||
| cdef double complex* dp_ptr = &diag_phases[0] | ||
| cdef double complex* tq_ptr = &two_qubit_gates[0] |
There was a problem hiding this comment.
Guard array sizes before taking raw pointers in apply_circuit.
Lines 366–369 dereference &... [0] before validating lengths; with empty buffers (including n_instructions == 0) this is unsafe and can crash due to disabled bounds checks.
Suggested fix
cpdef void apply_circuit(
@@
) noexcept:
@@
- cdef Py_ssize_t i, gp_offset, dp_offset, tq_offset
+ cdef Py_ssize_t i, gp_offset, dp_offset, tq_offset
cdef int op, tgt, ctrl
- cdef double complex* sv_ptr = &sv[0]
- cdef double complex* gp_ptr = &gate_params[0]
- cdef double complex* dp_ptr = &diag_phases[0]
- cdef double complex* tq_ptr = &two_qubit_gates[0]
+ cdef double complex* sv_ptr
+ cdef double complex* gp_ptr
+ cdef double complex* dp_ptr
+ cdef double complex* tq_ptr
+
+ if n_instructions <= 0:
+ return
+ if opcodes.shape[0] < n_instructions or targets.shape[0] < n_instructions or controls.shape[0] < n_instructions:
+ raise ValueError("instruction arrays shorter than n_instructions")
+ if gate_params.shape[0] < n_instructions * 4 or diag_phases.shape[0] < n_instructions * 2:
+ raise ValueError("parameter arrays too short for n_instructions")
+ if two_qubit_offsets.shape[0] < n_instructions:
+ raise ValueError("two_qubit_offsets shorter than n_instructions")
+
+ sv_ptr = &sv[0]
+ gp_ptr = &gate_params[0]
+ dp_ptr = &diag_phases[0]
+ tq_ptr = &two_qubit_gates[0]Also applies to: 371-405
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/pyqasm/accelerate/sv_sim.pyx` around lines 351 - 369, In apply_circuit,
avoid dereferencing &array[0] on possibly-empty Cython buffers; first check the
lengths (e.g., ensure sv, gate_params, diag_phases, two_qubit_gates have >0
elements or that n_instructions>0 where appropriate) and only set sv_ptr,
gp_ptr, dp_ptr, tq_ptr to &array[0] when the buffer is non-empty, otherwise set
the pointer to NULL (or skip pointer usage) to prevent invalid memory access;
update uses of sv_ptr/gp_ptr/dp_ptr/tq_ptr later in the function to handle
NULL/empty cases safely.
| # Cache: skip if already unrolled (no kwargs means default unroll) | ||
| if not kwargs and len(self._unrolled_ast.statements) > 0: | ||
| return |
There was a problem hiding this comment.
Invalidate cached unroll when source statements changed.
This early return can skip required re-unrolls and reuse stale _unrolled_ast after mutating operations that update _statements/original_program (for example, methods that append/remove statements without resetting _unrolled_ast). That can produce incorrect downstream behavior.
Suggested fix
- # Cache: skip if already unrolled (no kwargs means default unroll)
- if not kwargs and len(self._unrolled_ast.statements) > 0:
+ # Cache: skip only if already unrolled and source statements did not change.
+ if (
+ not kwargs
+ and len(self._unrolled_ast.statements) > 0
+ and self._unrolled_ast.statements is self._statements
+ ):
return📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Cache: skip if already unrolled (no kwargs means default unroll) | |
| if not kwargs and len(self._unrolled_ast.statements) > 0: | |
| return | |
| # Cache: skip only if already unrolled and source statements did not change. | |
| if ( | |
| not kwargs | |
| and len(self._unrolled_ast.statements) > 0 | |
| and self._unrolled_ast.statements is self._statements | |
| ): | |
| return |
| "qasm, description", | ||
| [ |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify current usage and confirm PT006-relevant callsite
rg -n -C2 '`@pytest`\.mark\.parametrize\(' tests/test_sv_sim.pyRepository: qBraid/pyqasm
Length of output: 131
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find ruff/flake8 config enabling PT006 (or general pytest style rules)
rg -n --hidden --no-ignore-vcs "PT006" -S . || true
rg -n --hidden --no-ignore-vcs "(pytest.*PT|PT006|ruff.*pytest|flake8.*pytest)" -S pyproject.toml ruff.toml setup.cfg tox.ini .flake8 . || true
# Confirm whether there are other parametrize callsites in this file
rg -n "pytest\.mark\.parametrize" tests/test_sv_sim.pyRepository: qBraid/pyqasm
Length of output: 340
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect ruff config around flake8-pytest-style and any PT006 selection/ignores
sed -n '1,120p' ruff.toml
echo "----"
rg -n "PT006|flake8-pytest-style|pytest-style" ruff.toml
echo "----"
# Show any pytest-style lint tables that might define enable/disable behavior
rg -n "flake8-pytest-style" -n ruff.toml && trueRepository: qBraid/pyqasm
Length of output: 2646
Switch pytest.mark.parametrize argnames to tuple form (PT006).
Ruff enables flake8-pytest-style via select = ["PT", ...] and does not ignore PT006, while tests/test_sv_sim.py currently uses a comma-separated string argnames ("qasm, description"), which trips the rule. Change to:
Suggested change
-@pytest.mark.parametrize(
- "qasm, description",
+@pytest.mark.parametrize(
+ ("qasm", "description"),🧰 Tools
🪛 Ruff (0.15.14)
[warning] 44-44: Wrong type passed to first argument of pytest.mark.parametrize; expected tuple
Use a tuple for the first argument
(PT006)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_sv_sim.py` around lines 44 - 45, Change the
pytest.mark.parametrize argnames from a comma-separated string to an explicit
tuple so it satisfies PT006; locate the pytest.mark.parametrize call in
tests/test_sv_sim.py that currently uses "qasm, description" and replace the
argnames with a tuple form like ("qasm", "description") (or equivalent
single-quoted tuple) while keeping the existing parameter values and test name
intact.
| if "global phase" in description: | ||
| # Compare up to global phase: find first nonzero element and normalize | ||
| idx = np.argmax(np.abs(sv_expected) > 1e-10) | ||
| phase = sv_actual[idx] / sv_expected[idx] | ||
| sv_actual = sv_actual / phase | ||
|
|
There was a problem hiding this comment.
Global-phase normalization should not depend on description text.
Conditioning on "global phase" in description is brittle and can miss valid equivalent states in other cases. Normalize phase unconditionally (or use a phase-invariant comparator helper).
Suggested change
- if "global phase" in description:
- # Compare up to global phase: find first nonzero element and normalize
- idx = np.argmax(np.abs(sv_expected) > 1e-10)
- phase = sv_actual[idx] / sv_expected[idx]
- sv_actual = sv_actual / phase
+ # Compare up to global phase for all cases
+ idx = np.argmax(np.abs(sv_expected) > 1e-10)
+ phase = sv_actual[idx] / sv_expected[idx]
+ sv_actual = sv_actual / phase🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_sv_sim.py` around lines 309 - 314, The current comparison in
tests/test_sv_sim.py only normalizes global phase when the string "global phase"
appears in description, which is brittle; change the logic in the block that
references description, sv_expected, sv_actual, idx, and phase so that the
global-phase normalization is applied unconditionally (or replace it with a
phase-invariant comparator helper) by removing the conditional check on
description and always computing idx = np.argmax(np.abs(sv_expected) > 1e-10),
phase = sv_actual[idx] / sv_expected[idx], and sv_actual = sv_actual / phase
before comparing vectors.
Two independent CI failures are addressed: 1. Source-distribution job (6 test failures): remove the unroll() caching shortcut in QasmModule that early-returned when the module looked already unrolled. It broke re-unrolling (e.g. unroll(external_gates=...) followed by unroll() to flush them), failing the depth/rebase/include tests. This also contradicted the PR's stated "preprocess on every run" goal. 2. Wheel-build jobs (qiskit unbuildable in manylinux containers): qiskit / qiskit-aer have no prebuilt wheels for the cibuildwheel containers and fall back to a from-source build needing Rust. Move those reference-oracle deps into a dedicated `test-sim` extra so the wheel test env (test,cli,pulse) no longer pulls them, and guard the qiskit imports in test_sv_sim.py with pytest.importorskip so the module skips cleanly where qiskit is absent. The sdist and coverage jobs install `test-sim` to keep running the comparison on the standard runners where qiskit installs fine. Also exclude the hardware-dependent benchmark perf-regression tests from the CI test runs (-m "not benchmark"), per tests/test_perf_regression.py's own docs.
The build appended -O3 -ffast-math -march=native unconditionally to every
Extension, regardless of compiler or architecture. This is wrong for
distributed wheels:
* -march=native ties the wheel to the build machine's CPU and can SIGILL on
older hardware; it is also not the right spelling on MSVC or arm64.
* MSVC silently ignores the GCC/Clang flags, so Windows wheels got none of
the intended optimization.
* -ffast-math relaxes IEEE semantics, which the statevector simulator relies
on.
Replace the static flag list with a build_ext subclass that selects flags per
compiler: -O3 for GCC/Clang, /O2 for MSVC, and OpenMP (-fopenmp / /openmp)
applied only to the sv_sim kernel. No -march (baseline ISA, matching SciPy and
scikit-learn) and no -ffast-math.
The macOS wheel jobs failed in delocate: the sv_sim OpenMP build links Homebrew's libomp, whose dylib has a minimum-macOS target newer than the wheel's macosx_11_0 tag, so delocate refuses to bundle it. (Pre-existing in this PR's OpenMP-on-macOS approach.) Disable OpenMP on macOS by default so no libomp is linked or bundled and the wheels stay portable (sv_sim builds single-threaded on macOS only; Linux and Windows keep OpenMP). A local source build can opt back in with PYQASM_MACOS_OPENMP=1. Drop the now-unnecessary `brew install libomp` step from the wheel workflows.
Disabling the -fopenmp flag on macOS was not enough: sv_sim.pyx unconditionally cimports prange/openmp, so the generated C still pulled in an unguarded <omp.h> and failed to compile on macOS (no system libomp). Gate the OpenMP imports, thread setup, and every prange branch behind a Cython compile-time IF USE_OPENMP. The serial `with nogil` paths (which already existed alongside each prange) become the sole code path when OpenMP is off, and no <omp.h> is emitted. setup.py decides USE_OPENMP once (Linux/Windows on, macOS off unless PYQASM_MACOS_OPENMP=1) and passes it to cythonize via compile_time_env, matching the compiler flags applied in build_ext. Verified: Linux/Windows OpenMP build and the serial macOS build are bit-identical, and the threaded prange path matches the serial path.
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
The simulator's crz fast path used a controlled-diagonal instruction that
can only phase the |control=1, target=1> amplitude, so it implemented a
controlled-phase gate diag(1,1,1,e^{i*theta/2}) instead of a true
controlled-Rz diag(1,1,e^{-i*theta/2},e^{i*theta/2}) -- it dropped the
phase on |control=1, target=0>. Route crz through the general
controlled-gate kernel with the full Rz matrix instead.
Add tests/test_sv_sim_core.py: a qiskit-free test module (the existing
simulator tests importorskip qiskit and are skipped in wheel-build
containers, and never exercised crz). It regression-tests crz via a
self-contained differential against pyqasm's own decomposition, plus
known-action gate checks, fusion paths, error handling, and helper unit
tests. Raises statevector.py coverage from 78% to 95%; the remainder is
dead/defensive code (notably the never-called generic two-qubit path).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up: dead generic two-qubit-gate scaffoldingWhile adding coverage/tests for the statevector simulator I noticed the generic two-qubit path is fully built but never reached, so flagging it here so we don't forget to either wire it up or remove it:
These lines are the bulk of the remaining uncovered code in Context: this is separate from the |
Summary
This PR targets
mainfrom the currentsvbranch work plus one additional commit that ports the honest, non-cache AlphaEvolve Run 06 statevector improvements.It deliberately does not include the evolved
_PROGRAM_INSTRUCTION_CACHE/ repeated-QASM memoization path, since that overfit the AlphaEvolve repeated-timing harness. The simulator still preprocesses each input on everySimulator.run(...)call.Included from
svsv_simkernelsAdditional evolved changes
simulation = ["numba>=0.59"]optional extra; fallback shim keeps numba optionalBenchmark artifacts updated
The existing default benchmark plots were regenerated from this PR branch with the improved simulator:
benchmarks/bench_random.pngbenchmarks/bench_qft.pngThe benchmarks folder also includes explicit no-cache before/after comparison artifacts:
benchmarks/bench_evolved_no_cache.jsonbenchmarks/bench_random_evolved_no_cache.pngbenchmarks/bench_qft_evolved_no_cache.pngThe comparison artifacts use pre-unrolled
QasmModuleinputs and intentionally do not include the repeated-QASM cache path.Validation
Performed locally against the installed
svbranch kernels from the AlphaEvolve workspace:QasmModulepathshots > 0smoke testatol=1e-10Regenerated default benchmark plots with:
No-cache comparison artifacts (
PYQASM_NUM_THREADS=1, median of 5):atol=1e-10AlphaEvolve full overlay results from the experiment showed the graph-compatible non-cache path at ~1.23x random and ~1.36x QFT geomean; this PR's committed benchmark artifacts were regenerated from the PR branch implementation.