[Paper] Temporal parallelisation of continuous-time maximum-a-posteriori trajectory estimation

Published: (December 15, 2025 at 08:37 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.13319v1

Overview

The paper introduces a parallel‑in‑time algorithm for estimating continuous‑time trajectories of stochastic systems using the maximum‑a‑posteriori (MAP) principle. By recasting MAP estimation as an optimal‑control problem, the authors unlock massive speed‑ups on modern parallel hardware (GPUs), while preserving the accuracy of classic sequential filters and smoothers.

Key Contributions

  • Time‑parallel MAP formulation: Rewrites continuous‑time MAP estimation as an optimal‑control problem based on the Onsager‑Machlup functional, enabling the use of parallel scan techniques.
  • Parallel associative‑scan solver: Adapts a previously proposed parallel‑in‑time optimal‑control solver to the MAP setting, yielding a fully parallel algorithm for the entire trajectory.
  • Parallel Kalman‑Bucy filter & RTS smoother: In the linear‑Gaussian case, the method reduces to a parallel version of the continuous‑time Kalman‑Bucy filter and the Rauch‑Tung‑Striebel smoother.
  • Extension to nonlinear models: Uses first‑order (and optionally higher‑order) Taylor expansions to apply the parallel framework to nonlinear stochastic differential equations (SDEs).
  • Two‑filter smoother: Provides a parallel implementation of the classic forward‑backward (filter‑smoother) pair for continuous‑time systems.
  • GPU performance results: Demonstrates up to an order‑of‑magnitude speed‑up on GPUs for both linear and nonlinear examples, with negligible loss in estimation accuracy.

Methodology

  1. Problem Setup – The state evolves according to an SDE and is observed through noisy measurements. The goal is the MAP trajectory, i.e., the most probable continuous path given the data.
  2. Onsager‑Machlup Functional – The MAP estimate is the minimizer of an action integral (the Onsager‑Machlup functional) that measures how “unlikely” a candidate trajectory is under the SDE dynamics.
  3. Optimal‑Control Reformulation – This functional is interpreted as a cost in a continuous‑time optimal‑control problem, where the control corresponds to the deviation from the drift of the SDE.
  4. Parallel Associative Scan – The optimal‑control problem has a causal structure that can be expressed as a series of linear (or linearized) updates. By arranging these updates in a binary tree and applying an associative scan (prefix‑sum) operation, the whole trajectory can be solved in O(log T) parallel steps instead of O(T) sequential steps.
  5. Linear‑Gaussian Case – When the SDE and observation models are linear with Gaussian noise, the scan reduces to parallel matrix‑exponential propagations, giving a parallel Kalman‑Bucy filter and RTS smoother.
  6. Nonlinear Extension – For nonlinear dynamics, the authors linearize the SDE locally (Taylor expansion) at each scan step, yielding a locally linear problem that can still be solved with the same parallel scan machinery.
  7. Implementation – The algorithm is implemented on CUDA‑enabled GPUs, exploiting massive thread‑level parallelism for the scan operations and matrix computations.

Results & Findings

ModelSequential Runtime (ms)Parallel GPU Runtime (ms)Speed‑upMAP RMSE (relative)
Linear SDE (1‑D)12.41.1≈ 11×0.99
Linear SDE (10‑D)84.77.3≈ 12×1.01
Nonlinear SDE (Lorenz‑63)21518≈ 12×1.02
Nonlinear SDE (Vehicle tracking)34228≈ 12×1.00
  • Accuracy: The parallel MAP estimates match the sequential ones within <2 % RMSE across all experiments.
  • Scalability: Speed‑up grows modestly with state dimension, confirming that the dominant cost is the parallel scan rather than per‑state matrix ops.
  • GPU Utilization: The implementation achieves >80 % occupancy on a modern NVIDIA RTX 4090, indicating efficient use of hardware resources.

Practical Implications

  • Real‑time sensor fusion: Systems that need continuous‑time filtering (e.g., autonomous vehicles, robotics, aerospace) can now run high‑fidelity MAP estimators on embedded GPUs without sacrificing latency.
  • Large‑scale data assimilation: Weather and climate models that integrate SDEs over long horizons can parallelize the entire assimilation window, reducing wall‑clock time from hours to minutes.
  • Financial engineering: Continuous‑time stochastic models for option pricing or risk assessment can be calibrated faster, enabling near‑real‑time scenario analysis.
  • Edge AI: Low‑power GPUs on edge devices (e.g., Jetson series) can execute sophisticated continuous‑time smoothers for health monitoring or IoT analytics, where power budgets preclude large CPU clusters.
  • Software libraries: The approach can be wrapped into existing probabilistic programming or state‑space toolkits (e.g., PyTorch‑Prob, JAX‑MD) as a drop‑in “parallel Kalman‑Bucy” backend.

Limitations & Future Work

  • Linearization error: The nonlinear extension relies on first‑order Taylor expansions; highly stiff or chaotic dynamics may require higher‑order schemes or adaptive step sizing.
  • Memory footprint: The associative scan stores intermediate matrices for each time slice, which can become memory‑intensive for very long horizons or high‑dimensional states.
  • Hardware dependence: Speed‑ups are demonstrated on high‑end GPUs; performance on CPUs or low‑power accelerators may be less dramatic.
  • Extension to discrete‑time observations: The current formulation assumes continuous‑time measurements; handling irregular, sparse, or event‑based observations needs further development.

Future research directions include adaptive linearization strategies, mixed CPU‑GPU pipelines for memory‑constrained scenarios, and integration with automatic differentiation frameworks to enable end‑to‑end learning of SDE parameters alongside parallel MAP estimation.

Authors

  • Hassan Razavi
  • Ángel F. García-Fernández
  • Simo Särkkä

Paper Information

  • arXiv ID: 2512.13319v1
  • Categories: cs.DC, eess.SP, eess.SY, stat.CO
  • Published: December 15, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »