[Paper] Data-driven stochastic reduced-order modeling of parametrized dynamical systems

Published: 3 weeks ago (January 15, 2026 at 01:50 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.10690v1

Overview

The paper presents a data‑driven framework for building stochastic reduced‑order models (ROMs) that can predict the behavior of complex dynamical systems across a wide range of parameters and external forcings. By marrying amortized stochastic variational inference with a clever re‑parameterization of Markov Gaussian processes, the authors achieve fast, uncertainty‑aware predictions without repeatedly solving expensive high‑fidelity simulations.

Key Contributions

Amortized stochastic variational inference for ROMs – learns a probabilistic encoder/decoder and the latent stochastic differential equations (SDEs) in a single, end‑to‑end training pass.
Re‑parameterization trick for Markov Gaussian processes – removes the need for costly forward solvers during training, making the computational cost independent of dataset size and system stiffness.
Parameter‑space generalization – the learned model can extrapolate to unseen combinations of system parameters and forcing functions.
Built‑in uncertainty quantification – the stochastic latent dynamics naturally provide predictive variance, useful for risk‑aware decision making.
Optional physics‑informed priors – the framework can incorporate known physical constraints when they are available, improving data efficiency.
Empirical validation on three challenging benchmarks – demonstrates superior accuracy and orders‑of‑magnitude speed‑ups over existing ROM techniques.

Methodology

Data collection – high‑fidelity simulations are run for a limited set of parameter values and forcing histories, producing state trajectories.
Probabilistic autoencoder – a neural encoder compresses each high‑dimensional state snapshot into a low‑dimensional latent vector; a decoder reconstructs the full state from the latent code.
Latent stochastic dynamics – the latent vectors are assumed to evolve according to a continuous‑time SDE whose drift and diffusion functions are parameterized by neural networks.
Amortized inference – instead of solving an SDE for every training sample, the authors apply a re‑parameterization of the Markov Gaussian process, turning the stochastic dynamics into a differentiable “sample‑once” operation.
Joint training – the encoder, decoder, and SDE networks are optimized together by maximizing a variational lower bound (the evidence‑lower‑bound, ELBO). This yields both a compact ROM and calibrated uncertainty estimates.
Physics‑informed priors (optional) – known conservation laws or symmetries can be encoded as priors on the drift/diffusion networks, guiding learning when data are scarce.

Results & Findings

Benchmark	Traditional ROM (deterministic)	Proposed Stochastic ROM	Speed‑up
Nonlinear oscillator with varying stiffness	High error on unseen parameters	< 5 % relative error, reliable variance	≈ 30×
Fluid flow past a cylinder (Re‑varying)	Diverges for out‑of‑sample Reynolds numbers	Accurate lift/drag predictions, calibrated confidence intervals	≈ 25×
Heat diffusion with time‑varying source	Over‑smoothed predictions	Captures transient spikes, uncertainty grows with source variability	≈ 40×

Generalization: The learned SDEs correctly interpolate and even extrapolate to parameter regimes not seen during training.
Uncertainty calibration: Predictive variances increase in regions where the training data are sparse, matching empirical errors.
Computational efficiency: Training time scales linearly with the latent dimension, not with the number of high‑fidelity snapshots; inference is real‑time for all three test cases.

Practical Implications

Rapid prototyping of simulation‑based products – engineers can replace costly CFD or structural solvers with a lightweight stochastic ROM that still provides confidence bounds.
Robust control and optimization – controllers can incorporate predictive uncertainty directly, leading to safer, more reliable decisions under varying operating conditions.
Digital twins for smart manufacturing – a stochastic ROM can continuously update predictions as new sensor data arrive, flagging anomalies when uncertainty spikes.
Resource‑constrained environments – edge devices (e.g., autonomous drones) can run the latent SDE in milliseconds, enabling on‑board forecasting without cloud dependence.
Facilitates data‑driven discovery – the amortized inference pipeline can be reused across different physical domains, reducing the time needed to build bespoke ROMs.

Limitations & Future Work

Training data quality: The approach still relies on a representative set of high‑fidelity simulations; severe gaps in the parameter space can degrade performance.
Latent dimensionality selection: Choosing the right latent size remains heuristic; too small loses dynamics, too large hampers interpretability.
Scalability to extremely high‑dimensional fields: While the method decouples cost from dataset size, the encoder/decoder networks may become a bottleneck for 3‑D turbulent flows with millions of degrees of freedom.
Future directions suggested by the authors include:
1. Adaptive sampling strategies to automatically enrich training data where uncertainty is high.
2. Extensions to non‑Gaussian latent processes (e.g., Lévy flights) for heavy‑tailed dynamics.
3. Tighter integration with physics‑informed neural networks to enforce conservation laws more rigorously.

Authors

Andrew F. Ilersich
Kevin Course
Prasanth B. Nair

Paper Information

arXiv ID: 2601.10690v1
Categories: cs.LG
Published: January 15, 2026
PDF: Download PDF

[Paper] Data-driven stochastic reduced-order modeling of parametrized dynamical systems

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Do explanations generalize across large reasoning models?

[Paper] Building Production-Ready Probes For Gemini

[Paper] ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

[Paper] MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management