[Paper] Precision Autotuning for Linear Solvers via Contextual Bandit-Based RL

Published: 1 month ago (January 2, 2026 at 10:59 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.00728v1

Overview

The paper introduces a reinforcement‑learning (RL) framework that automatically picks the numeric precision (e.g., half, single, double) for each step of a linear‑solver pipeline. By casting the problem as a contextual bandit, the system learns from runtime features (like an estimated condition number) and selects the cheapest precision that still meets a target accuracy. The authors demonstrate the idea on iterative refinement, showing that mixed‑precision runs can match double‑precision quality while cutting compute time.

Key Contributions

First RL‑based autotuner for numeric precision in scientific kernels, bridging mixed‑precision computing and reinforcement learning.
Contextual‑bandit formulation that treats solver state features as context and precision choices as actions, enabling fast, online learning with a simple Q‑table.
Epsilon‑greedy action selection that balances exploration (trying new precisions) and exploitation (using the best‑known configuration).
Multi‑objective reward design that jointly optimizes solution accuracy and computational cost.
Empirical validation on unseen linear systems, showing consistent cost reductions without sacrificing double‑precision accuracy.
Generalizable pipeline that can be extended to other numerical algorithms beyond iterative refinement.

Methodology

Feature Extraction – Before each refinement iteration, cheap-to‑compute statistics are gathered (e.g., an approximate condition number, matrix norm, residual size). These form a low‑dimensional state vector.
Discretization – The continuous feature space is bucketed into a finite set of “contexts” so that a classic tabular Q‑learning approach can be used.
Action Space – Each action corresponds to a specific precision configuration for the solver’s sub‑steps (e.g., compute residual in single precision, solve correction in half precision).
Reward Function – A weighted sum of (i) negative error (to encourage accuracy) and (ii) negative runtime or FLOP count (to encourage speed). The weights let users prioritize one objective over the other.
Learning Loop – Using an epsilon‑greedy policy, the tuner picks an action, runs the solver step, observes the reward, and updates the Q‑value for the (context, action) pair via incremental averaging. Over many solves, the Q‑table converges to the best precision policy for each context.
Deployment – At inference time, the tuner simply looks up the best action for the current context, incurring negligible overhead.

Results & Findings

Metric	Baseline (double‑precision)	RL‑tuned mixed‑precision
Average runtime reduction	–	≈ 30 %
Solution error (relative)	1e‑12 (double)	1.2e‑12 (within 20 % of baseline)
Success on unseen matrices	N/A	> 95 % of runs meet target tolerance
Overhead of tuner	N/A	< 1 % of total solve time

Key takeaways: the RL tuner learns to drop precision aggressively when the matrix is well‑conditioned, yet automatically restores higher precision for tougher cases. The Q‑table remains compact (a few kilobytes) and generalizes across problem sizes and distributions not seen during training.

Practical Implications

Performance‑critical libraries (e.g., PETSc, Trilinos) can embed the bandit tuner to automatically exploit mixed‑precision hardware (Tensor Cores, bfloat16 units) without manual tuning.
Edge and embedded devices with limited compute can achieve double‑precision‑level accuracy while staying within power budgets by dynamically lowering precision.
Compiler/runtime frameworks (LLVM, TVM, OneAPI) could expose a “precision‑autotune” pass that generates the feature extraction and Q‑lookup code automatically.
Cloud services offering linear‑solver APIs can reduce CPU/GPU usage per request, translating directly into cost savings.
The approach is lightweight enough to be online‑learned: a production system can continue refining its Q‑table as new problem instances arrive, adapting to hardware upgrades or workload shifts.

Limitations & Future Work

State discretization may become coarse for very high‑dimensional feature sets, potentially missing subtle precision trade‑offs.
The current reward balances only accuracy vs. runtime; extending to memory bandwidth, energy, or numerical stability would broaden applicability.
Experiments focus on iterative refinement; applying the same bandit framework to other kernels (e.g., eigenvalue solvers, nonlinear optimization) remains to be demonstrated.
The method assumes that cheap feature estimates are available; for some problems (e.g., highly sparse or distributed matrices) extracting these features could be non‑trivial.
Future work could explore function‑approximation RL (e.g., deep Q‑networks) to handle continuous state spaces and larger action sets, as well as meta‑learning to transfer policies across hardware generations.

Authors

Erin Carson
Xinye Chen

Paper Information

arXiv ID: 2601.00728v1
Categories: cs.LG, math.NA
Published: January 2, 2026
PDF: Download PDF

[Paper] Precision Autotuning for Linear Solvers via Contextual Bandit-Based RL

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Two Deep Learning Approaches for Automated Segmentation of Left Ventricle in Cine Cardiac MRI

[Paper] Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning

[Paper] FedHypeVAE: Federated Learning with Hypernetwork Generated Conditional VAEs for Differentially Private Embedding Sharing

[Paper] Categorical Reparameterization with Denoising Diffusion models