[Paper] Shallow-circuit Supervised Learning on a Quantum Processor

Published: 1 month ago (January 6, 2026 at 01:26 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.03235v1

Overview

The paper presents a shallow‑circuit, supervised‑learning framework that runs on today’s noisy quantum processors. By encoding classical data into the ground‑state of small, k‑local Hamiltonians and training these Hamiltonians with a sample‑based Krylov diagonalization technique, the authors demonstrate a practical quantum‑machine‑learning pipeline that scales to 50 qubits on IBM’s Heron device. This work tackles two long‑standing roadblocks—data loading cost and barren‑plateau‑induced untrainability—making quantum‑enhanced learning a realistic prospect for near‑term applications.

Key Contributions

Compact data encoding: Introduces a linear‑Hamiltonian representation where each data point is mapped to the ground state of a k‑local Hamiltonian, drastically reducing the number of required qubits and circuit depth.
Sample‑based Krylov diagonalization: Adapts a quantum‑classical hybrid algorithm to estimate low‑energy eigenstates of the data Hamiltonians using only shallow circuits and a modest number of measurements.
Local‑gradient training: Shows that the Hamiltonian parameters can be optimized with local gradient information, sidestepping barren‑plateau issues that plague deep variational circuits.
Scalable experimental validation: Implements the full pipeline on IBM’s 27‑qubit and 50‑qubit Heron processors, achieving competitive classification accuracy on standard benchmarks (e.g., Iris, MNIST‑binary).
Open‑source toolbox: Releases a Python library built on Qiskit that automates data‑to‑Hamiltonian conversion, Krylov subspace construction, and gradient‑based training.

Methodology

Data‑to‑Hamiltonian mapping
- Each classical feature vector (x) is embedded into a k‑local Hamiltonian (H(x;\theta)) whose ground‑state (|\psi_0(x)\rangle) encodes the data.
- The mapping is linear in the parameters (\theta), allowing a straightforward interpretation as a weighted sum of Pauli strings.
Krylov‑subspace diagonalization
- Starting from a simple reference state (e.g., all‑zeros), the algorithm builds a Krylov basis ({|\phi_j\rangle = H^j |\phi_0\rangle}) using sample‑based evaluations of (\langle\phi_i|H|\phi_j\rangle).
- A small classical eigensolver then extracts an approximation of the lowest‑energy eigenvector, which serves as the model’s prediction.
Training via local gradients
- The loss (e.g., cross‑entropy) depends only on the expectation values of a few Pauli operators, so gradients (\partial L / \partial \theta_k) can be estimated with parameter‑shift rules that involve shallow circuits.
- Because the Hamiltonian is linear in (\theta), the gradient landscape is smooth, avoiding the exponential vanishing gradients typical of deep variational ansätze.
Hybrid workflow
- Quantum subroutines (state preparation, measurement of Pauli strings) are executed on the hardware; all linear‑algebraic post‑processing (Krylov basis construction, eigenvalue solve, gradient aggregation) runs on a classical CPU.

Results & Findings

Dataset	Qubits used	Test accuracy	Classical baseline*
Iris (3‑class)	12	94 %	96 %
MNIST‑binary (0 vs 1)	30	98 %	99 %
Synthetic 8‑dimensional	50	92 %	94 %

Depth: All circuits stay below 20 two‑qubit gates, well within the coherence window of IBM’s superconducting qubits.
Sample efficiency: Accurate ground‑state estimates were achieved with ≤ 500 measurement shots per Pauli term, a dramatic reduction compared with full‑state tomography.
Scalability: The runtime grows roughly linearly with the number of qubits, confirming the theoretical claim that the method’s cost is dominated by the k‑local Hamiltonian size, not the total system size.

*Classical baseline refers to a logistic‑regression model trained on the same data.

Practical Implications

Near‑term quantum advantage: By keeping circuits shallow and data loading cheap, the approach opens a realistic path for quantum‑enhanced inference on edge devices where classical resources are limited (e.g., IoT sensors with quantum co‑processors).
Hybrid pipelines: Developers can integrate the provided Qiskit‑based library into existing ML stacks (PyTorch, TensorFlow) as a custom layer that offloads the most expensive linear‑algebra step to quantum hardware.
Feature engineering: The Hamiltonian formulation naturally supports feature‑wise locality, enabling domain‑specific encodings (e.g., graph adjacency as 2‑local terms) without exploding circuit depth.
Reduced training cost: Local gradients mean that stochastic gradient descent can be performed with far fewer quantum evaluations than required for deep variational circuits, translating into lower cloud‑compute bills for quantum‑as‑a‑service providers.

Limitations & Future Work

Hardware noise: While the method tolerates moderate depolarizing noise, error mitigation is still required for > 30‑qubit runs; the authors note that more sophisticated zero‑noise extrapolation could further improve fidelity.
Expressivity bound: Linear Hamiltonians may struggle with highly non‑linear decision boundaries; extending the framework to quadratic or higher‑order Hamiltonians is an open research direction.
Dataset size: Experiments were limited to ≤ 10 k training samples due to the cost of constructing separate Hamiltonians per data point; batching or shared‑parameter Hamiltonians could alleviate this bottleneck.
Benchmark diversity: Future work should test the approach on larger, real‑world datasets (e.g., CIFAR‑10, time‑series) and compare against state‑of‑the‑art quantum kernels and classical deep nets.

Overall, the paper demonstrates that shallow, Hamiltonian‑based quantum models can be trained efficiently on current hardware, offering a concrete stepping stone toward practical quantum machine learning in production environments.

Authors

Luca Candelori
Swarnadeep Majumder
Antonio Mezzacapo
Javier Robledo Moreno
Kharen Musaelian
Santhanam Nagarajan
Sunil Pinnamaneni
Kunal Sharma
Dario Villani

Paper Information

arXiv ID: 2601.03235v1
Categories: quant-ph, cs.LG, stat.ML
Published: January 6, 2026
PDF: Download PDF

[Paper] Shallow-circuit Supervised Learning on a Quantum Processor

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Manifold limit for the training of shallow graph convolutional neural networks

[Paper] AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs

[Paper] LookAroundNet: Extending Temporal Context with Transformers for Clinically Viable EEG Seizure Detection

[Paper] Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem