[Paper] Data-driven stochastic reduced-order modeling of parametrized dynamical systems
Source: arXiv - 2601.10690v1
Overview
The paper presents a data‑driven framework for building stochastic reduced‑order models (ROMs) that can predict the behavior of complex dynamical systems across a wide range of parameters and external forcings. By marrying amortized stochastic variational inference with a clever re‑parameterization of Markov Gaussian processes, the authors achieve fast, uncertainty‑aware predictions without repeatedly solving expensive high‑fidelity simulations.
Key Contributions
- Amortized stochastic variational inference for ROMs – learns a probabilistic encoder/decoder and the latent stochastic differential equations (SDEs) in a single, end‑to‑end training pass.
- Re‑parameterization trick for Markov Gaussian processes – removes the need for costly forward solvers during training, making the computational cost independent of dataset size and system stiffness.
- Parameter‑space generalization – the learned model can extrapolate to unseen combinations of system parameters and forcing functions.
- Built‑in uncertainty quantification – the stochastic latent dynamics naturally provide predictive variance, useful for risk‑aware decision making.
- Optional physics‑informed priors – the framework can incorporate known physical constraints when they are available, improving data efficiency.
- Empirical validation on three challenging benchmarks – demonstrates superior accuracy and orders‑of‑magnitude speed‑ups over existing ROM techniques.
Methodology
- Data collection – high‑fidelity simulations are run for a limited set of parameter values and forcing histories, producing state trajectories.
- Probabilistic autoencoder – a neural encoder compresses each high‑dimensional state snapshot into a low‑dimensional latent vector; a decoder reconstructs the full state from the latent code.
- Latent stochastic dynamics – the latent vectors are assumed to evolve according to a continuous‑time SDE whose drift and diffusion functions are parameterized by neural networks.
- Amortized inference – instead of solving an SDE for every training sample, the authors apply a re‑parameterization of the Markov Gaussian process, turning the stochastic dynamics into a differentiable “sample‑once” operation.
- Joint training – the encoder, decoder, and SDE networks are optimized together by maximizing a variational lower bound (the evidence‑lower‑bound, ELBO). This yields both a compact ROM and calibrated uncertainty estimates.
- Physics‑informed priors (optional) – known conservation laws or symmetries can be encoded as priors on the drift/diffusion networks, guiding learning when data are scarce.
Results & Findings
| Benchmark | Traditional ROM (deterministic) | Proposed Stochastic ROM | Speed‑up |
|---|---|---|---|
| Nonlinear oscillator with varying stiffness | High error on unseen parameters | < 5 % relative error, reliable variance | ≈ 30× |
| Fluid flow past a cylinder (Re‑varying) | Diverges for out‑of‑sample Reynolds numbers | Accurate lift/drag predictions, calibrated confidence intervals | ≈ 25× |
| Heat diffusion with time‑varying source | Over‑smoothed predictions | Captures transient spikes, uncertainty grows with source variability | ≈ 40× |
- Generalization: The learned SDEs correctly interpolate and even extrapolate to parameter regimes not seen during training.
- Uncertainty calibration: Predictive variances increase in regions where the training data are sparse, matching empirical errors.
- Computational efficiency: Training time scales linearly with the latent dimension, not with the number of high‑fidelity snapshots; inference is real‑time for all three test cases.
Practical Implications
- Rapid prototyping of simulation‑based products – engineers can replace costly CFD or structural solvers with a lightweight stochastic ROM that still provides confidence bounds.
- Robust control and optimization – controllers can incorporate predictive uncertainty directly, leading to safer, more reliable decisions under varying operating conditions.
- Digital twins for smart manufacturing – a stochastic ROM can continuously update predictions as new sensor data arrive, flagging anomalies when uncertainty spikes.
- Resource‑constrained environments – edge devices (e.g., autonomous drones) can run the latent SDE in milliseconds, enabling on‑board forecasting without cloud dependence.
- Facilitates data‑driven discovery – the amortized inference pipeline can be reused across different physical domains, reducing the time needed to build bespoke ROMs.
Limitations & Future Work
- Training data quality: The approach still relies on a representative set of high‑fidelity simulations; severe gaps in the parameter space can degrade performance.
- Latent dimensionality selection: Choosing the right latent size remains heuristic; too small loses dynamics, too large hampers interpretability.
- Scalability to extremely high‑dimensional fields: While the method decouples cost from dataset size, the encoder/decoder networks may become a bottleneck for 3‑D turbulent flows with millions of degrees of freedom.
- Future directions suggested by the authors include:
- Adaptive sampling strategies to automatically enrich training data where uncertainty is high.
- Extensions to non‑Gaussian latent processes (e.g., Lévy flights) for heavy‑tailed dynamics.
- Tighter integration with physics‑informed neural networks to enforce conservation laws more rigorously.
Authors
- Andrew F. Ilersich
- Kevin Course
- Prasanth B. Nair
Paper Information
- arXiv ID: 2601.10690v1
- Categories: cs.LG
- Published: January 15, 2026
- PDF: Download PDF