[Paper] QEMI: A Quantum Software Stacks Testing Framework via Equivalence Modulo Inputs
Source: arXiv - 2602.09942v1
Overview
The paper introduces QEMI, a novel testing framework for quantum software stacks (QSS) that adapts the classic Equivalence Modulo Inputs (EMI) technique to the quantum domain. By automatically generating quantum programs with “dead” code and then stripping that dead code to create equivalent variants, QEMI can spot crashes and subtle behavioral bugs in popular quantum SDKs such as Qiskit, Q#, and Cirq.
Key Contributions
- Random quantum program generator that inserts dead code using quantum control‑flow constructs (e.g.,
if‑else,while‑like loops). - Quantum‑adapted EMI: a systematic way to produce semantically‑preserving program variants by removing dead code, enabling differential testing without a ground‑truth oracle.
- Cross‑platform evaluation on three major quantum SDKs, uncovering 11 crash bugs and 1 functional inconsistency that were previously unknown.
- Demonstration that semantic‑preserving transformations (beyond structural rewrites) are viable for quantum software testing, expanding the limited toolbox for QSS validation.
Methodology
-
Program Generation
- A fuzzing engine creates random quantum circuits wrapped in classical‑style control flow (e.g.,
if (measure == 0) { … }). - Some branches are deliberately made dead—they are never reachable for a given input state because the control condition is always false.
- A fuzzing engine creates random quantum circuits wrapped in classical‑style control flow (e.g.,
-
Equivalence Modulo Inputs (EMI) Adaptation
- For each generated program, QEMI identifies dead code sections.
- It then produces a variant by deleting those dead sections, guaranteeing that the variant’s observable behavior (measurement statistics) should be identical to the original for any input.
-
Differential Execution
- Both the original and the stripped variant are compiled and run on the target QSS (e.g., Qiskit’s transpiler + simulator).
- The framework compares outcomes: mismatched results or crashes indicate a bug in the stack’s handling of the dead code or its removal.
-
Bug Classification
- Crashes (e.g., segmentation faults, unhandled exceptions) are logged as crash bugs.
- Divergent measurement distributions are flagged as behavioral inconsistencies.
Results & Findings
| SDK | Crash Bugs | Behavioral Inconsistencies |
|---|---|---|
| Qiskit | 5 | 0 |
| Q# (Microsoft Quantum Development Kit) | 3 | 1 |
| Cirq | 3 | 0 |
- Crash bugs often stem from mishandled control‑flow metadata or optimizer passes that assume dead code cannot exist.
- The single behavioral inconsistency was traced to an incorrect gate‑fusion optimization that altered the circuit’s probability amplitudes when dead branches were present.
- Across all platforms, the majority of bugs were discovered without any hand‑crafted test cases, proving the power of automated, semantics‑preserving transformations.
Practical Implications
- For SDK developers: QEMI provides an automated regression‑testing pipeline that can be integrated into CI/CD workflows, catching crashes early before releases.
- For quantum algorithm engineers: The framework highlights hidden edge cases (e.g., dead‑code handling) that could affect performance or correctness when algorithms are compiled for different backends.
- For hardware vendors: Since QEMI works on the full stack—from source code to low‑level gate schedules—it can surface bugs that only appear after hardware‑specific optimizations, aiding in more robust compiler‑hardware co‑design.
- Open‑source potential: The authors released the generator and EMI engine, enabling the community to extend the approach to other languages (e.g., Braket, PyQuil) or to target noisy‑intermediate‑scale quantum (NISQ) devices directly.
Limitations & Future Work
- Dead‑code detection currently relies on static analysis of classical‑style control flow; more sophisticated quantum‑specific dead‑code patterns (e.g., entanglement‑based dead paths) are not yet covered.
- The approach assumes that removed dead code does not affect global quantum state—a property that holds for the generated programs but may be violated in more complex, data‑dependent circuits.
- Scalability: Generating very large circuits can stress simulators and increase execution time; future work could incorporate sampling strategies or hybrid simulation‑hardware runs.
- The authors plan to explore semantic equivalence beyond dead‑code removal, such as applying known circuit identities (e.g., gate commutation) to create richer variant families for deeper testing.
QEMI opens a practical path for systematic, oracle‑free testing of quantum software stacks, giving developers a new lever to improve the reliability of the tools that will underpin tomorrow’s quantum applications.
Authors
- Junjie Luo
- Shangzhou Xia
- Fuyuan Zhang
- Jianjun Zhao
Paper Information
- arXiv ID: 2602.09942v1
- Categories: cs.SE
- Published: February 10, 2026
- PDF: Download PDF