[Paper] QMutBench: A Dataset of Quantum Circuit Mutants
Source: arXiv - 2604.15870v1
Overview
The paper introduces QMutBench, a publicly‑available dataset of more than 700 k mutants—intentionally faulty versions—of quantum circuits. By supplying a rich, searchable repository of realistic quantum bugs, the authors give researchers and engineers a concrete benchmark for measuring how well their quantum testing tools actually catch errors.
Key Contributions
- Large‑scale mutant corpus: > 700 000 quantum circuit mutants covering a wide range of gate‑level faults.
- Online query interface: Users can filter mutants by original circuit, target survival rate, gate type, and other mutation attributes.
- Standardised fault taxonomy: The dataset classifies mutations (e.g., gate replacement, parameter perturbation, qubit‑swap) to enable reproducible experiments.
- Benchmarking baseline: Provides a ready‑to‑use ground truth for evaluating test‑case effectiveness and for comparing different quantum testing strategies.
- Enabler for mutation‑guided testing: The resource can be leveraged to design new testing heuristics that specifically target hard‑to‑detect faults.
Methodology
- Circuit selection – The authors gathered a diverse set of quantum programs from existing repositories (Qiskit tutorials, IBM Q Experience examples, etc.) to serve as “original” circuits.
- Mutation operators – They defined a suite of quantum‑specific mutation operators, such as:
- Gate substitution (e.g., replace an
Xwith aY). - Parameter alteration (tweak rotation angles).
- Qubit re‑mapping (swap control/target qubits).
- Insertion/deletion of identity or measurement gates.
- Gate substitution (e.g., replace an
- Automated mutant generation – A custom script applied each operator to every eligible location in each original circuit, producing a combinatorial explosion of mutants.
- Survival‑rate estimation – For each mutant, the authors simulated its execution on a noisy quantum backend to estimate the probability that the fault would be undetected (the “survival rate”).
- Dataset packaging – Mutants, metadata (original circuit ID, operator type, affected qubits, survival rate), and a lightweight web UI were bundled and released under an open‑source license.
Results & Findings
- Coverage breadth: The final corpus spans circuits ranging from 2‑qubit toy examples to 20‑qubit algorithms, ensuring relevance for both near‑term NISQ devices and larger future hardware.
- Diverse fault profiles: Survival rates vary widely (from < 1 % to > 90 %), highlighting that some mutations are trivially caught while others are stealthy—exactly the kind of edge cases needed for robust testing.
- Baseline effectiveness: When applying a simple random test‑case generator, the authors observed average fault detection rates of ~45 %, confirming that many mutants remain undetected by naïve testing.
- Usability: The web interface allows users to retrieve a custom subset (e.g., “all mutants of circuit X with survival rate > 70 %”) in seconds, demonstrating the practicality of the dataset for rapid prototyping.
Practical Implications
- Test‑suite evaluation – Developers can now quantify how many of the 700 k+ realistic faults their quantum test generators actually expose, turning vague “coverage” claims into concrete numbers.
- Tool comparison – Researchers can benchmark competing testing frameworks (e.g., property‑based testing vs. fuzzing) on a shared fault set, fostering fairer competition and faster progress.
- Mutation‑guided test generation – By focusing on mutants with high survival rates, automated test generators can prioritize “hard” faults, leading to more efficient use of limited quantum hardware time.
- Education & onboarding – Instructors can use QMutBench to illustrate common quantum programming mistakes, giving students hands‑on experience with debugging quantum code.
- Hardware‑aware testing – Since survival rates are estimated on noisy simulators, the dataset can help developers understand how hardware noise masks certain bugs, informing error‑mitigation strategies.
Limitations & Future Work
- Noise model dependency – Survival rates are based on simulated noise; real devices may exhibit different detection characteristics.
- Operator scope – While the current mutation operators cover many gate‑level faults, higher‑level logical bugs (e.g., incorrect algorithmic flow) are not represented.
- Scalability to larger circuits – Generating mutants for circuits beyond ~30 qubits becomes computationally expensive; future work could explore sampling strategies.
- Dynamic updates – The dataset is static; incorporating new quantum languages or emerging gate sets will require periodic maintenance.
Overall, QMutBench fills a critical gap in quantum software engineering by giving the community a shared, extensible benchmark for testing and improving quantum code.
Authors
- Eñaut Mendiluze Usandizaga
- Thomas Laurent
- Paolo Arcaini
- Shaukat Ali
Paper Information
- arXiv ID: 2604.15870v1
- Categories: cs.SE, cs.DB
- Published: April 17, 2026
- PDF: Download PDF