[Paper] Direct Learning of Calibration-Aware Uncertainty for Neural PDE Surrogates
Source: arXiv - 2602.11090v1
Overview
This paper tackles a practical problem that many engineers face when using neural networks as surrogates for partial‑differential‑equation (PDE) solvers: how to obtain reliable, calibrated uncertainty estimates when data are scarce or only partially observed. The authors propose a training‑time technique—cross‑regularized uncertainty—that learns uncertainty parameters directly, without relying on ensembles, dropout, or costly post‑hoc calibration, and demonstrate its benefits on Fourier Neural Operators (FNOs) across a range of data regimes.
Key Contributions
- Cross‑regularized uncertainty framework: Introduces a simple yet effective way to learn uncertainty parameters during training by routing gradients through a held‑out regularization split of the data.
- Regime‑adaptive noise learning: Enables the model to automatically adjust noise levels (at the output head, within hidden features, or in operator‑specific components such as spectral modes) based on the amount of observed data.
- Integration with Fourier Neural Operators: Provides a concrete implementation for state‑of‑the‑art neural PDE surrogates, showing that the method works with modern operator‑learning architectures.
- Comprehensive empirical evaluation: Uses the APEBench benchmark to sweep over observation fractions and training‑set sizes, demonstrating superior calibration and error‑focused uncertainty fields compared with baselines (ensembles, dropout, post‑hoc temperature scaling).
- Interpretability of uncertainty maps: Shows that learned uncertainty concentrates in spatial regions where one‑step prediction errors are highest, offering actionable diagnostics for downstream decision‑making.
Methodology
-
Data split – The training dataset is divided into two disjoint subsets:
- Training split: Used to optimize the predictor’s parameters for accurate PDE solutions.
- Regularization split: Held out during predictor updates but used to train low‑dimensional uncertainty parameters (e.g., variance scalars).
-
Cross‑regularization loss – The total loss combines:
- A standard prediction loss (e.g., MSE) on the training split.
- A calibration loss (e.g., negative log‑likelihood or proper scoring rule) evaluated on the regularization split, where the learned uncertainty parameters appear.
-
Gradient routing – Because the uncertainty parameters only affect the regularization loss, gradients flow only through the regularization split for those parameters, while the predictor’s weights receive gradients from the training split. This decouples fitting from uncertainty learning, preventing the model from “cheating” by inflating variance to reduce loss.
-
Where to inject uncertainty – The authors experiment with three locations:
- Output head: A scalar or diagonal covariance added to the final prediction.
- Hidden features: Learned noise injected into intermediate representations.
- Spectral modes (specific to FNOs): Mode‑wise variance that respects the Fourier structure of the operator.
-
Training loop – Standard stochastic gradient descent (or Adam) is used; the only extra step is computing the regularization loss on the held‑out split each iteration.
Results & Findings
| Setting | Baseline (Ensemble / Dropout / Post‑hoc) | Cross‑regularized (this work) |
|---|---|---|
| Low observation fraction (10 %) | Poor calibration (ECE > 0.2) | Calibrated (ECE ≈ 0.07) |
| Medium observation (50 %) | Moderate calibration | Consistently lower NLL and sharper predictive intervals |
| Large training set | Marginal gains over plain predictor | Slightly better calibration, but gains shrink (as expected) |
| Uncertainty localisation | Diffuse, often unrelated to error hotspots | High uncertainty aligns with regions of large one‑step spatial error |
Key take‑aways
- The learned predictive distributions are significantly better calibrated across all data regimes, as measured by Expected Calibration Error (ECE) and Negative Log‑Likelihood (NLL).
- Uncertainty fields are spatially informative, highlighting where the surrogate is likely to be inaccurate—a valuable diagnostic for downstream control or optimization loops.
- The method achieves these improvements without the computational overhead of ensembles or the hyper‑parameter tuning required for dropout rates.
Practical Implications
- Reduced compute budget – Developers can replace costly ensembles (often 5‑10 models) with a single model that learns its own uncertainty, cutting inference cost by up to 90 %.
- Plug‑and‑play with existing operator learners – The approach works with any neural PDE surrogate that has a differentiable loss; integrating it into an existing FNO or DeepONet pipeline requires only a small code change (data split and extra loss term).
- Risk‑aware decision pipelines – In fields like climate modeling, fluid dynamics, or materials design, downstream optimization (e.g., design of experiments, active learning, safety‑critical control) can now consume calibrated variance estimates directly, improving robustness.
- Automated data‑regime adaptation – As the amount of observed data changes (e.g., during online learning or when sensors fail), the model automatically scales its uncertainty, removing the need for manual noise‑level tuning.
- Diagnostic visualisation – Engineers can overlay the learned uncertainty maps on simulation domains to quickly spot “blind spots” and prioritize data collection or mesh refinement.
Limitations & Future Work
- Dependence on a held‑out regularization split – The method requires reserving a portion of the data solely for uncertainty learning, which may be non‑trivial in extremely data‑starved scenarios.
- Scalability to very high‑dimensional outputs – While the paper demonstrates mode‑wise variance for FNOs, extending to full covariance structures for large 3‑D fields could be memory‑intensive.
- Assumption of Gaussian predictive noise – The current formulation uses simple (often diagonal) Gaussian uncertainties; richer, multimodal uncertainty representations remain unexplored.
- Limited to supervised PDE surrogates – Applying the same cross‑regularization idea to unsupervised or physics‑informed neural operators is an open question.
Future research directions suggested by the authors include: exploring adaptive split strategies (e.g., curriculum learning), integrating non‑Gaussian likelihoods, and testing the framework on real‑world engineering pipelines such as aerodynamic shape optimisation or reservoir simulation.
Authors
- Carlos Stein Brito
Paper Information
- arXiv ID: 2602.11090v1
- Categories: cs.LG, cs.AI, cs.CE, stat.CO
- Published: February 11, 2026
- PDF: Download PDF