[Paper] Validation of an analyzability model for quantum software: a family of experiments

Published: (February 24, 2026 at 11:35 AM EST)
5 min read
Source: arXiv

Source: arXiv - 2602.21074v1

Overview

The paper empirically validates the quantum‑component of a hybrid software analyzability model built on the ISO/IEC 25010 quality framework. By running four controlled experiments with students and professionals, the authors show that the model’s numeric scores line up with how people actually perceive the difficulty of understanding quantum algorithms—an essential step toward making quantum‑enhanced applications maintainable in industry.

Key Contributions

  • Validated analyzability metric for quantum code – first large‑scale empirical evidence that a standards‑based model can reliably rank quantum components by how easy they are to understand.
  • Four‑study experimental protocol – combines classroom, lab, and industry settings, covering 120+ participants with varied quantum expertise.
  • Correlation analysis – demonstrates a statistically significant alignment (Spearman ρ ≈ 0.68) between model scores and human‑perceived complexity.
  • Guidelines for metric integration – practical recommendations on embedding the analyzability calculation into CI pipelines and code‑review tools.
  • Open dataset & tooling – the authors release the raw experiment data and a lightweight Python library (quant-analyzability) that computes the metric from Qiskit, Cirq, or Q# source files.

Methodology

  1. Model foundation – The authors extend ISO/IEC 25010’s Analyzability sub‑characteristic to quantum software, defining measurable sub‑attributes (e.g., modularity, readability of quantum gates, documentation of quantum‑specific concepts).
  2. Metric computation – Each sub‑attribute is scored automatically (0–5) using static‑analysis heuristics (gate count, depth, naming conventions, comment density) and combined via a weighted sum to produce an overall analyzability index (AI).
  3. Experiment design
    • Study 1 (Academic): 30 undergrad CS students evaluate 5 small Qiskit notebooks; AI scores are computed automatically.
    • Study 2 (Graduate): 25 master’s students assess 7 medium‑size algorithms (e.g., Grover, QFT).
    • Study 3 (Industry Lab): 35 software engineers from a quantum‑hardware startup review 4 real‑world modules written in Q#.
    • Study 4 (Mixed): 30 participants (mix of students and professionals) perform a blind ranking of 6 algorithm variants.
  4. Data collection – Participants rate perceived difficulty on a 7‑point Likert scale. The authors then compute Pearson/Spearman correlations between AI and human ratings, and run ANOVA tests to check whether AI distinguishes known “easy” vs. “hard” algorithms.

The approach is deliberately lightweight: no deep cognitive testing, just straightforward perception surveys paired with automatically generated metric values.

Results & Findings

StudyCorrelation (AI ↔ Human rating)SignificanceANOVA p‑value (easy vs. hard)
1 (Undergrad)0.61 (Spearman)p < 0.010.004
2 (Graduate)0.72p < 0.0010.001
3 (Industry)0.66p < 0.010.003
4 (Mixed)0.68p < 0.010.002
  • Consistent discrimination: The AI reliably gave lower scores to algorithms participants labeled “hard” (e.g., quantum phase estimation) and higher scores to “easy” ones (e.g., Bell‑state preparation).
  • Alignment with perception: Across all cohorts, the AI explained roughly 45 % of the variance in perceived difficulty, a strong signal given the subjective nature of the task.
  • Tool performance: The open‑source quant-analyzability library processed all test cases in under 200 ms, showing feasibility for integration into continuous‑integration pipelines.

Practical Implications

  • CI‑ready quality gate – Teams can automatically reject pull requests that degrade the analyzability index beyond a configurable threshold, preventing “hard‑to‑read” quantum code from entering the codebase.
  • Technical debt tracking – By logging AI over time, managers can quantify quantum‑specific technical debt and prioritize refactoring (e.g., splitting deep circuits, improving naming).
  • Onboarding acceleration – New hires can use AI scores as a guide to locate “low‑complexity” modules for learning, reducing the steep learning curve typical of quantum stacks.
  • Vendor‑agnostic assessment – Because the metric relies on language‑independent static properties (gate count, depth, comment density), it works across Qiskit, Cirq, Q#, and emerging SDKs, making it suitable for heterogeneous quantum‑software projects.
  • Standard‑compliant reporting – Aligning with ISO/IEC 25010 eases integration with existing software‑quality dashboards, allowing organizations to extend their current quality models to the quantum layer without reinventing the wheel.

Limitations & Future Work

  • Scope limited to static analysis – Dynamic aspects (runtime noise, hardware‑specific optimizations) are not captured, which may affect real‑world maintainability.
  • Participant expertise skew – The majority of participants had academic backgrounds; broader industry sampling (e.g., finance, logistics) would strengthen external validity.
  • Metric weighting – Current weights were derived heuristically; future work could employ machine‑learning techniques to learn optimal weights from larger corpora.
  • Extension to other quality attributes – The authors plan to validate related ISO/IEC 25010 sub‑characteristics (e.g., Reliability and Portability) for quantum components, moving toward a full‑stack quality model.

Overall, the study provides a concrete, empirically backed tool that developers can start using today to keep quantum codebases readable and maintainable—an essential step as quantum computing moves from research labs into production environments.

Authors

  • Ana Díaz-Muñoz
  • José A. Cruz-Lemus
  • Moisés Rodríguez
  • Maria Teresa Baldassarre
  • Mario Piattini

Paper Information

  • arXiv ID: 2602.21074v1
  • Categories: cs.SE
  • Published: February 24, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »