[Paper] ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations

Published: 3 days ago (May 7, 2026 at 11:07 AM EDT)

5 min read

Source: arXiv

Source: arXiv - 2605.06392v1

Overview

The paper introduces ADELIA, the first Automatic Differentiation (AD)‑enabled implementation of Integrated Nested Laplace Approximations (INLA). By replacing the costly finite‑difference (FD) gradient estimates that scale with the number of hyperparameters, ADELIA delivers up to 8× faster gradient computation and dramatically lower energy consumption for large‑scale Bayesian latent Gaussian models used in environmental and health analytics.

Key Contributions

AD integration for INLA: Shows how reverse‑mode AD can be applied to INLA’s structured‑sparse linear algebra kernels, delivering exact gradients independent of the hyperparameter count.
Multi‑GPU backward pass: Designs a sparsity‑aware, multi‑GPU algorithm that exploits the block‑triangular structure of INLA’s precision matrices, achieving high parallel efficiency.
Performance gains: Demonstrates 4.2–7.9× per‑gradient speedups on ten benchmark models (including a 1.9 M‑variable air‑pollution case) and up to 8× lower energy usage compared with state‑of‑the‑art FD‑based INLA.
Robust convergence: Shows that AD‑based gradients lead to more stable Newton‑type optimization, enabling convergence on models where FD fails.
Open‑source prototype: Provides a reference implementation that can be plugged into existing INLA workflows, paving the way for broader adoption in high‑performance Bayesian inference.

Methodology

Problem setting – INLA solves hierarchical Bayesian models by maximizing a marginal likelihood over d hyperparameters. The optimizer needs the gradient ∂ℓ/∂θ, traditionally approximated with central finite differences, which requires 2d + 1 full model evaluations.
Reverse‑mode AD – The authors rewrite the forward INLA pipeline (building sparse precision matrices, performing Cholesky factorizations, solving linear systems) in a differentiable programming style. Reverse‑mode AD automatically propagates sensitivities from the scalar log‑likelihood back to each hyperparameter, yielding exact gradients in a single backward pass.
Sparse‑aware GPU kernels – Because the forward pass already uses a custom multi‑GPU sparse Cholesky factorization, the backward pass re‑uses the same sparsity pattern. The authors implement custom CUDA kernels that traverse the elimination tree in reverse, accumulating gradient contributions without materializing dense intermediates.
Parallel orchestration – The forward and backward phases are overlapped across GPUs using MPI + NCCL communication, ensuring that the additional AD work does not become a bottleneck.
Benchmark suite – Ten models ranging from synthetic 2‑D GRFs to a real‑world air‑pollution monitoring network (≈ 1.9 M latent variables, 12 hyperparameters) are used to compare ADELIA against the FD‑based reference INLA implementation.

Results & Findings

Model	#Latent vars	#Hyperparams (d)	FD gradient time (s)	ADELIA gradient time (s)	Speedup	Energy ratio (FD/ADELIA)
Synthetic 2‑D GRF	250 k	8	12.4	2.1	5.9×	6.2×
Air‑pollution (real)	1.9 M	12	84.7	11.3	7.5×	7.8×
… (8 more)	…	…	…	…	4.2–7.9×	5–8×

Gradient accuracy: AD gradients match finite‑difference gradients up to machine precision, eliminating truncation error.
Optimization convergence: Newton‑Raphson steps with AD gradients converge in fewer iterations (average 6 vs. 9 for FD) and avoid the occasional divergence observed with noisy FD estimates.
Scalability: Even when scaling FD to 16–32 GPUs to match ADELIA’s wall‑clock time, the FD approach still consumes 5–8× more energy due to the repeated forward evaluations.

Practical Implications

Faster model development: Data scientists can iterate on complex spatio‑temporal Bayesian models in hours rather than days, making INLA viable for interactive analytics pipelines.
Cost‑effective HPC usage: Reduced energy and compute time translate directly into lower cloud‑GPU bills, especially for large government or industry projects (e.g., nationwide air‑quality monitoring).
More reliable inference: Exact gradients remove the need to tune finite‑difference step sizes, simplifying deployment in automated MLOps workflows.
Extensibility: The sparsity‑aware AD pattern can be transplanted to other structured‑sparse probabilistic inference frameworks (e.g., Gaussian Markov Random Fields, variational inference libraries).
Edge‑to‑cloud scenarios: Because the backward pass re‑uses the same sparse factorization, ADELIA can be integrated into hybrid CPU‑GPU systems where only a subset of nodes have accelerators, enabling scalable inference on heterogeneous clusters.

Limitations & Future Work

GPU‑centric implementation: The current prototype assumes access to multi‑GPU nodes; performance on CPU‑only clusters is not addressed.
Memory overhead: Storing the elimination tree and intermediate factors for the backward pass adds ~30 % memory compared with the FD baseline, which may limit ultra‑large models on memory‑constrained GPUs.
Hyperparameter dimensionality: While AD eliminates the d‑dependence of gradient cost, the overall optimization still scales with the number of hyperparameters in terms of iteration count; smarter hyperparameter priors or second‑order methods could further reduce wall‑time.
Generalization to non‑Gaussian latent structures: ADELIA focuses on latent Gaussian models; extending the approach to non‑Gaussian latent fields (e.g., Poisson‑GLMMs with non‑linear link functions) remains an open research direction.

Overall, ADELIA demonstrates that marrying automatic differentiation with sparsity‑aware GPU kernels can unlock a new level of performance for large‑scale Bayesian inference, turning a traditionally academic tool into a production‑ready workhorse for developers and data engineers.

Authors

Afif Boudaoud
Lisa Gaedke-Merzhäuser
Alexandros Nikolaos Ziogas
Vincent Maillou
Alexandru Calotoiu
Marcin Copik
Håvard Rue
Mathieu Luisier
Torsten Hoefler

Paper Information

arXiv ID: 2605.06392v1
Categories: cs.DC, cs.PF
Published: May 7, 2026
PDF: Download PDF

[Paper] ADELIA: Automatic Differentiation for Efficient Laplace Inference Approximations

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Stencil Computations on Cerebras Wafer-Scale Engine

[Paper] Accelerating Precise End-to-End Simulation: Latency-Sensitive Many-core System Modeling

[Paper] A Scalable Recipe on SuperMUC-NG Phase 2: Efficient Large-Scale Training of Language Models

[Paper] Stencil Computations on Tenstorrent Wormhole