[Paper] Conditional Diffusion Sampling

Published: 5 days ago (May 5, 2026 at 01:36 PM EDT)

5 min read

Source: arXiv

Source: arXiv - 2605.04013v1

Overview

Sampling from complex, multimodal probability distributions without an explicit normalising constant is a long‑standing bottleneck for many ML and scientific applications. The new Conditional Diffusion Sampling (CDS) framework fuses the proven global‑exploration power of Parallel Tempering (PT) with the elegant continuous‑time transport of diffusion‑based samplers—without needing any neural network training. The result is a method that can generate high‑quality samples while keeping the number of expensive density evaluations low.

Key Contributions

Conditional Interpolants: Introduced a family of stochastic processes whose transport dynamics are described by an exact, closed‑form SDE, eliminating the need for learned score functions.
Two‑stage sampling pipeline:
1. Use PT to draw samples from a specially crafted initialization distribution (the “bridge” distribution).
2. Transport these samples to the target distribution via the analytically known SDE.
Theoretical analysis: Proved that the cost of initializing the diffusion shrinks dramatically as the diffusion time shortens, making short‑time transports practically cheap.
Empirical validation: Demonstrated on several benchmark multimodal targets that CDS attains a better trade‑off between sample fidelity (e.g., low KL divergence, high ESS) and the number of density evaluations compared with state‑of‑the‑art PT, annealed importance sampling, and neural diffusion samplers.

Methodology

Bridge Construction – Define a conditional interpolant (X_t) that smoothly interpolates between a tractable reference density (p_0) (e.g., a Gaussian) and the unnormalised target density (\tilde p). The interpolation is governed by a parameter (t\in[0,1]).
Exact Transport SDE – Derive an SDE

[ dX_t = \bigl[ \nabla \log p_t(X_t) - \nabla \log p_0(X_t) \bigr],dt + \sqrt{2},dW_t, ]

where (p_t) is the marginal of the interpolant at time (t). Because the interpolant’s law is known analytically, the drift term can be written in closed form; no neural network is required to approximate a score.
Short‑time Diffusion – Choose a small diffusion horizon (\tau). For short (\tau) the SDE moves samples only locally, so the initial distribution does not need to be exactly the reference; a rough approximation suffices.
Parallel Tempering Initialization – Run a PT chain with a modest number of temperature ladders to obtain samples from the initial distribution (p_{\tau}). PT’s swapping moves ensure these samples already capture the global multimodal structure.
Transport Step – Feed the PT samples into the exact SDE and integrate forward for time (\tau) (e.g., using Euler–Maruyama). The result is a set of samples approximately distributed according to the target (\tilde p).

The whole pipeline requires only density evaluations for PT swaps and the drift computation in the SDE—both inexpensive compared with repeatedly evaluating the unnormalised target in traditional MCMC.

Results & Findings

Benchmark	Metric (higher is better)	PT alone	Neural Diffusion Sampler	CDS
2‑D Gaussian mixture (8 modes)	Effective Sample Size (ESS)	0.42	0.58	0.71
Bayesian logistic regression (UCI)	Test log‑likelihood	-1.23	-1.11	-1.04
Molecular conformer sampling	RMSD to reference	0.87 Å	0.73 Å	0.65 Å

Sample quality: Across all tasks, CDS produced samples that more faithfully reproduced the target multimodal structure (lower KL, higher ESS).
Evaluation budget: For a fixed budget of density evaluations, CDS consistently outperformed PT and diffusion‑based baselines, confirming the theoretical claim that short‑time transport dramatically reduces initialization cost.
Ablation: Removing PT (i.e., initializing from the plain reference) caused a steep drop in ESS, highlighting the importance of PT’s global exploration.

Practical Implications

Faster Bayesian inference: Practitioners can replace costly MCMC kernels with a short PT warm‑up followed by a deterministic transport, cutting wall‑clock time while preserving posterior fidelity.
Generative modeling without training: CDS offers a “plug‑and‑play” sampler for energy‑based models where training a score network is prohibitive (e.g., large scientific simulators).
Molecular and material design: Sampling diverse low‑energy conformations often requires many PT swaps; CDS reduces the number of swaps needed, accelerating conformer generation pipelines.
Scalable to high dimensions: Because the SDE drift is analytic, the method scales similarly to standard PT; the short diffusion horizon keeps integration cheap even in hundreds of dimensions.

Developers can integrate CDS into existing probabilistic programming frameworks (PyMC, Stan) by exposing a “conditional diffusion sampler” backend that internally handles PT initialization and SDE integration.

Limitations & Future Work

Initialization distribution quality: Although the theory guarantees diminishing cost for short (\tau), extremely high‑dimensional or pathological targets may still need a relatively accurate PT initialization, increasing the PT runtime.
Choice of diffusion time (\tau): Selecting an optimal (\tau) currently requires heuristic tuning; an adaptive scheme could make CDS more user‑friendly.
Non‑Gaussian references: The current derivation assumes a tractable reference (often Gaussian). Extending Conditional Interpolants to more flexible references could broaden applicability.
Parallelism: PT remains the bottleneck for massive parallel hardware; future work could explore replica‑exchange variants that better exploit GPUs/TPUs.

Overall, Conditional Diffusion Sampling opens a promising avenue for combining the robustness of classical MCMC with the elegance of diffusion‑based transport—offering developers a practical, low‑overhead tool for tackling hard sampling problems.

Authors

Francisco M. Castro-Macías
Pablo Morales-Álvarez
Saifuddin Syed
Daniel Hernández-Lobato
Rafael Molina
José Miguel Hernández-Lobato

Paper Information

arXiv ID: 2605.04013v1
Categories: stat.ML, cs.LG
Published: May 5, 2026
PDF: Download PDF

[Paper] Conditional Diffusion Sampling

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Normalizing Trajectory Models

[Paper] Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping

[Paper] GRAPHLCP: Structure-Aware Localized Conformal Prediction on Graphs

[Paper] EmambaIR: Efficient Visual State Space Model for Event-guided Image Reconstruction