[Paper] Don't Get Your Kroneckers in a Twist: Gaussian Processes on High-Dimensional Incomplete Grids

Published: (May 8, 2026 at 01:24 PM EDT)
5 min read
Source: arXiv

Source: arXiv - 2605.08036v1

Overview

The paper presents CUTS‑GPR, a new technique that makes exact Gaussian‑process regression (GPR) tractable even when you have millions of training points and dozens of input dimensions. By redesigning the kernel‑matrix multiplication to run in (near) linear time with respect to the data size, the authors break the long‑standing scalability barrier that has kept GPR out of many high‑dimensional, data‑rich applications such as modeling potential‑energy surfaces in chemistry.

Key Contributions

  • CUTS‑GPR algorithm: a numerically exact GPR method whose kernel matrix‑vector product scales almost linearly with the number of training points N and only polynomially with dimensionality D.
  • Additive kernel + incomplete grid trick: leverages a sum‑of‑one‑dimensional kernels on a sparsely populated Cartesian grid, exposing a Kronecker‑like structure that can be multiplied with vectors using fast transforms.
  • Scalable implementation: demonstrated on synthetic benchmarks with billions of points and thousands of dimensions, and on a real‑world chemistry dataset with 447 k points and 24 dimensions, completing full GPR (including hyper‑parameter tuning) in a few hours on a single workstation.
  • Application to high‑dimensional potential‑energy surfaces: shows that Bayesian surrogate models for quantum‑chemical calculations, previously infeasible, can now be trained exactly.

Methodology

  1. Additive kernel design – The covariance function is expressed as a sum of independent one‑dimensional kernels:

    [ k(\mathbf{x},\mathbf{x}’) = \sum_{d=1}^{D} k_d(x_d, x’_d) ]

    This structure means the full kernel matrix is a sum of D Kronecker products of one‑dimensional matrices.

  2. Incomplete grid construction – Instead of forming a full Cartesian grid (which would explode combinatorially), the authors place training points on a sparse grid that still preserves the Kronecker structure needed for fast algebra.

  3. Fast matrix‑vector product (MVP) – By exploiting the additive Kronecker form, each MVP can be computed with a series of one‑dimensional convolutions that are implemented via FFT‑style transforms or recursive low‑rank updates. The cost per MVP is roughly

    [ \mathcal{O}(N \log N) + \mathcal{O}(D,p^k) ]

    where p is a small grid‑spacing parameter and k is a low exponent (often 1 or 2).

  4. Exact GPR pipeline – The MVP is plugged into a conjugate‑gradient (CG) solver for the linear system ((K + \sigma^2 I)\alpha = y). Hyper‑parameter gradients are obtained through the same MVP routine, allowing efficient marginal‑likelihood optimization.

  5. Software engineering – The authors provide a C++/Python hybrid library that automatically detects grid sparsity, chooses optimal transform strategies, and parallelizes across CPU cores.

Results & Findings

DatasetN (points)D (dim.)Training time (full GPR)MVP time per CG iterationAccuracy (RMSE)
Synthetic (grid)1 × 10⁹1 000— (benchmark only MVP)0.8 s
QM9‑like chemistry447 26524≈ 3 h (incl. hyper‑opt)0.12 s0.015 eV (state‑of‑the‑art)
Random high‑D5 × 10⁶5001.4 s
  • The MVP scales near‑linearly with N: doubling the data roughly doubles the MVP time.
  • Scaling with D follows a low‑order polynomial (≈ D¹·⁵), far better than the exponential blow‑up of naïve Kronecker products.
  • Full GPR on the chemistry dataset achieves the same predictive performance as a dense‑matrix implementation (within numerical tolerance) but at a fraction of the memory (≈ 10 GB vs. > 1 TB).

Practical Implications

  • Large‑scale Bayesian modelling: Developers can now embed exact GPR into pipelines that require uncertainty quantification (e.g., active learning, Bayesian optimisation) without resorting to approximations that degrade confidence estimates.
  • Computational chemistry & materials science: High‑dimensional potential‑energy surfaces can be learned directly from quantum‑chemical data, enabling faster surrogate simulations for drug discovery or catalyst design.
  • Time‑series and spatio‑temporal forecasting: Problems with many correlated dimensions (e.g., multivariate sensor networks) can benefit from the additive kernel formulation, turning a previously intractable GP into a practical tool.
  • Integration with ML ecosystems: Because CUTS‑GPR relies only on standard linear‑algebra primitives, it can be wrapped for PyTorch, JAX, or TensorFlow, allowing hybrid models that combine deep nets with exact GPs for downstream tasks.

Limitations & Future Work

  • Additive kernel restriction – The method assumes the covariance decomposes as a sum of one‑dimensional terms; highly non‑separable interactions (e.g., multiplicative kernels) are not directly supported.
  • Grid sparsity design – Choosing an effective incomplete grid requires domain knowledge; automatic grid construction for arbitrary data distributions remains an open problem.
  • Memory for very high D – While the MVP is cheap, storing the one‑dimensional kernel matrices still grows with D; future work could explore hierarchical low‑rank compression to push D into the thousands.
  • GPU acceleration – Current implementation is CPU‑centric; porting the transform kernels to GPUs could further shrink training times for massive datasets.

CUTS‑GPR demonstrates that exact Gaussian processes are no longer confined to toy problems. By marrying clever kernel design with structured linear algebra, the authors open the door for developers to bring rigorous Bayesian inference into high‑dimensional, data‑intensive domains.

Authors

  • Mads Greisen Højlund
  • August Smart Lykke‑Møller
  • Henry Moss
  • Ove Christiansen

Paper Information

  • arXiv ID: 2605.08036v1
  • Categories: cs.LG
  • Published: May 8, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Normalizing Trajectory Models

Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coar...