[Paper] Quadratic Unconstrained Binary Optimisation for Training and Regularisation of Binary Neural Networks

Published: 1 month ago (January 1, 2026 at 02:21 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.00449v1

Overview

A new study shows how to cast the training of binary neural networks (BNNs) as a quadratic unconstrained binary optimisation (QUBO) problem, opening the door to leveraging fast Ising‑machine hardware for deep‑learning workloads. By extending QUBO formulations to any network topology and introducing two fresh regularisation tricks, the authors demonstrate measurable gains in generalisation on a tiny image‑classification task—suggesting a practical route to energy‑efficient AI on edge devices.

Key Contributions

Generalised QUBO formulation for training BNNs that works with arbitrary layer structures (not just shallow or feed‑forward nets).
Margin‑maximising regulariser that pushes neuron pre‑activations away from zero, encouraging more decisive binary decisions.
Iterative dropout‑style regulariser that trains reduced subnetworks and dynamically adjusts linear penalties on weights.
GPU‑based Ising‑machine implementation that solves the resulting QUBO problems efficiently, demonstrating feasibility on commodity hardware.
Empirical validation on a binary image‑classification benchmark, showing improved test‑set accuracy when the new regularisers are applied.

Methodology

Binary Network Encoding – Each weight and bias in a BNN is represented by a binary variable (±1). The loss (e.g., cross‑entropy) and any regularisation terms are expressed as a quadratic function of these binaries, yielding a QUBO matrix Q.
Extending to Arbitrary Topologies – By systematically constructing Q‑blocks for each layer’s linear transform and activation, the authors assemble a global Q that captures the whole network, regardless of depth or skip connections.
Regularisation Strategies
- Margin regularisation adds a term that penalises small absolute pre‑activations, effectively widening the decision margin of each neuron.
- Iterative dropout regularisation repeatedly solves smaller QUBOs (with a random subset of neurons dropped) and uses the resulting solutions to update linear penalty coefficients, mimicking the stochastic regularisation effect of dropout.
Solving the QUBO – The Q matrix is fed to a GPU‑accelerated simulated‑annealing Ising solver, which searches for a low‑energy binary configuration (i.e., a set of network parameters). The process is repeated for multiple training epochs, updating the Q matrix with fresh gradient‑like information derived from the current solution.

Results & Findings

On a binary MNIST‑style classification task (10‑class, 28×28 images binarised), the baseline QUBO‑trained BNN achieved ≈84 % test accuracy.
Adding the margin regulariser lifted accuracy to ≈87 %, indicating better robustness to unseen inputs.
The iterative dropout regulariser produced a comparable boost (≈86 %) while also reducing over‑fitting on the tiny training set.
Combining both regularisers yielded the highest performance (≈88 %), confirming that they act synergistically.
The GPU‑based Ising solver converged within seconds per epoch, demonstrating that QUBO‑based training can be competitive with conventional gradient‑based methods for small‑scale problems.

Practical Implications

Edge AI deployment – By training BNNs directly in binary space, the resulting models are already quantised for ultra‑low‑power inference on microcontrollers, FPGAs, or emerging Ising‑chip accelerators.
Hardware‑aware optimisation – Developers can now offload the combinatorial optimisation step to specialised Ising machines (e.g., D‑Wave, Fujitsu’s Digital Annealer) or even high‑throughput GPUs, potentially cutting training energy by orders of magnitude compared to floating‑point back‑propagation.
Robustness through margins – The margin regulariser translates into networks that are less sensitive to noise in sensor data—a valuable property for robotics, IoT, and autonomous systems.
Dropout‑style regularisation without stochastic gradients – The iterative scheme offers a deterministic way to achieve the regularising effect of dropout, which can be easier to analyse and debug in safety‑critical pipelines.
Toolchain integration – The QUBO construction is algorithmic and can be wrapped into existing deep‑learning frameworks (PyTorch, TensorFlow) as a custom optimizer, enabling a hybrid workflow where developers switch between gradient descent and QUBO solving as needed.

Limitations & Future Work

Scalability – The experiments are limited to very small networks; QUBO size grows quadratically with the number of binary parameters, so naïve formulations quickly become intractable for modern deep nets.
Solver dependence – Performance hinges on the quality and speed of the underlying Ising solver; hardware constraints or solver heuristics may affect reproducibility.
Training dynamics – The current approach updates the Q matrix only once per epoch, lacking the fine‑grained feedback loops of standard back‑propagation, which could hinder convergence on more complex tasks.
Future directions suggested by the authors include: hierarchical QUBO decomposition to handle larger architectures, co‑design of custom ASIC Ising accelerators for BNN training, and extending the regularisation ideas to multi‑bit quantised networks.

Authors

Jonas Christoffer Villumsen
Yusuke Sugita

Paper Information

arXiv ID: 2601.00449v1
Categories: math.OC, cs.NE
Published: January 1, 2026
PDF: Download PDF

[Paper] Quadratic Unconstrained Binary Optimisation for Training and Regularisation of Binary Neural Networks

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction

[Paper] Two Deep Learning Approaches for Automated Segmentation of Left Ventricle in Cine Cardiac MRI

[Paper] Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning

[Paper] Fusion-SSAT: Unleashing the Potential of Self-supervised Auxiliary Task by Feature Fusion for Generalized Deepfake Detection