[Paper] QEMI: A Quantum Software Stacks Testing Framework via Equivalence Modulo Inputs

Published: 2 days ago (February 10, 2026 at 11:26 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.09942v1

Overview

The paper introduces QEMI, a novel testing framework for quantum software stacks (QSS) that adapts the classic Equivalence Modulo Inputs (EMI) technique to the quantum domain. By automatically generating quantum programs with “dead” code and then stripping that dead code to create equivalent variants, QEMI can spot crashes and subtle behavioral bugs in popular quantum SDKs such as Qiskit, Q#, and Cirq.

Key Contributions

Random quantum program generator that inserts dead code using quantum control‑flow constructs (e.g., if‑else, while‑like loops).
Quantum‑adapted EMI: a systematic way to produce semantically‑preserving program variants by removing dead code, enabling differential testing without a ground‑truth oracle.
Cross‑platform evaluation on three major quantum SDKs, uncovering 11 crash bugs and 1 functional inconsistency that were previously unknown.
Demonstration that semantic‑preserving transformations (beyond structural rewrites) are viable for quantum software testing, expanding the limited toolbox for QSS validation.

Methodology

Program Generation
- A fuzzing engine creates random quantum circuits wrapped in classical‑style control flow (e.g., if (measure == 0) { … }).
- Some branches are deliberately made dead—they are never reachable for a given input state because the control condition is always false.
Equivalence Modulo Inputs (EMI) Adaptation
- For each generated program, QEMI identifies dead code sections.
- It then produces a variant by deleting those dead sections, guaranteeing that the variant’s observable behavior (measurement statistics) should be identical to the original for any input.
Differential Execution
- Both the original and the stripped variant are compiled and run on the target QSS (e.g., Qiskit’s transpiler + simulator).
- The framework compares outcomes: mismatched results or crashes indicate a bug in the stack’s handling of the dead code or its removal.
Bug Classification
- Crashes (e.g., segmentation faults, unhandled exceptions) are logged as crash bugs.
- Divergent measurement distributions are flagged as behavioral inconsistencies.

Results & Findings

SDK	Crash Bugs	Behavioral Inconsistencies
Qiskit	5	0
Q# (Microsoft Quantum Development Kit)	3	1
Cirq	3	0

Crash bugs often stem from mishandled control‑flow metadata or optimizer passes that assume dead code cannot exist.
The single behavioral inconsistency was traced to an incorrect gate‑fusion optimization that altered the circuit’s probability amplitudes when dead branches were present.
Across all platforms, the majority of bugs were discovered without any hand‑crafted test cases, proving the power of automated, semantics‑preserving transformations.

Practical Implications

For SDK developers: QEMI provides an automated regression‑testing pipeline that can be integrated into CI/CD workflows, catching crashes early before releases.
For quantum algorithm engineers: The framework highlights hidden edge cases (e.g., dead‑code handling) that could affect performance or correctness when algorithms are compiled for different backends.
For hardware vendors: Since QEMI works on the full stack—from source code to low‑level gate schedules—it can surface bugs that only appear after hardware‑specific optimizations, aiding in more robust compiler‑hardware co‑design.
Open‑source potential: The authors released the generator and EMI engine, enabling the community to extend the approach to other languages (e.g., Braket, PyQuil) or to target noisy‑intermediate‑scale quantum (NISQ) devices directly.

Limitations & Future Work

Dead‑code detection currently relies on static analysis of classical‑style control flow; more sophisticated quantum‑specific dead‑code patterns (e.g., entanglement‑based dead paths) are not yet covered.
The approach assumes that removed dead code does not affect global quantum state—a property that holds for the generated programs but may be violated in more complex, data‑dependent circuits.
Scalability: Generating very large circuits can stress simulators and increase execution time; future work could incorporate sampling strategies or hybrid simulation‑hardware runs.
The authors plan to explore semantic equivalence beyond dead‑code removal, such as applying known circuit identities (e.g., gate commutation) to create richer variant families for deeper testing.

QEMI opens a practical path for systematic, oracle‑free testing of quantum software stacks, giving developers a new lever to improve the reliability of the tools that will underpin tomorrow’s quantum applications.

Authors

Junjie Luo
Shangzhou Xia
Fuyuan Zhang
Jianjun Zhao

Paper Information

arXiv ID: 2602.09942v1
Categories: cs.SE
Published: February 10, 2026
PDF: Download PDF

[Paper] QEMI: A Quantum Software Stacks Testing Framework via Equivalence Modulo Inputs

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Automated Test Suite Enhancement Using Large Language Models with Few-shot Prompting

[Paper] Unknown Attack Detection in IoT Networks using Large Language Models: A Robust, Data-efficient Approach

[Paper] PPTAM$η$: Energy Aware CI/CD Pipeline for Container Based Applications

[Paper] Performance Antipatterns: Angel or Devil for Power Consumption?