[Paper] Conditional Diffusion Sampling

Published: (May 5, 2026 at 01:36 PM EDT)
5 min read
Source: arXiv

Source: arXiv - 2605.04013v1

Overview

Sampling from complex, multimodal probability distributions without an explicit normalising constant is a long‑standing bottleneck for many ML and scientific applications. The new Conditional Diffusion Sampling (CDS) framework fuses the proven global‑exploration power of Parallel Tempering (PT) with the elegant continuous‑time transport of diffusion‑based samplers—without needing any neural network training. The result is a method that can generate high‑quality samples while keeping the number of expensive density evaluations low.

Key Contributions

  • Conditional Interpolants: Introduced a family of stochastic processes whose transport dynamics are described by an exact, closed‑form SDE, eliminating the need for learned score functions.
  • Two‑stage sampling pipeline:
    1. Use PT to draw samples from a specially crafted initialization distribution (the “bridge” distribution).
    2. Transport these samples to the target distribution via the analytically known SDE.
  • Theoretical analysis: Proved that the cost of initializing the diffusion shrinks dramatically as the diffusion time shortens, making short‑time transports practically cheap.
  • Empirical validation: Demonstrated on several benchmark multimodal targets that CDS attains a better trade‑off between sample fidelity (e.g., low KL divergence, high ESS) and the number of density evaluations compared with state‑of‑the‑art PT, annealed importance sampling, and neural diffusion samplers.

Methodology

  1. Bridge Construction – Define a conditional interpolant (X_t) that smoothly interpolates between a tractable reference density (p_0) (e.g., a Gaussian) and the unnormalised target density (\tilde p). The interpolation is governed by a parameter (t\in[0,1]).

  2. Exact Transport SDE – Derive an SDE

    [ dX_t = \bigl[ \nabla \log p_t(X_t) - \nabla \log p_0(X_t) \bigr],dt + \sqrt{2},dW_t, ]

    where (p_t) is the marginal of the interpolant at time (t). Because the interpolant’s law is known analytically, the drift term can be written in closed form; no neural network is required to approximate a score.

  3. Short‑time Diffusion – Choose a small diffusion horizon (\tau). For short (\tau) the SDE moves samples only locally, so the initial distribution does not need to be exactly the reference; a rough approximation suffices.

  4. Parallel Tempering Initialization – Run a PT chain with a modest number of temperature ladders to obtain samples from the initial distribution (p_{\tau}). PT’s swapping moves ensure these samples already capture the global multimodal structure.

  5. Transport Step – Feed the PT samples into the exact SDE and integrate forward for time (\tau) (e.g., using Euler–Maruyama). The result is a set of samples approximately distributed according to the target (\tilde p).

The whole pipeline requires only density evaluations for PT swaps and the drift computation in the SDE—both inexpensive compared with repeatedly evaluating the unnormalised target in traditional MCMC.

Results & Findings

BenchmarkMetric (higher is better)PT aloneNeural Diffusion SamplerCDS
2‑D Gaussian mixture (8 modes)Effective Sample Size (ESS)0.420.580.71
Bayesian logistic regression (UCI)Test log‑likelihood-1.23-1.11-1.04
Molecular conformer samplingRMSD to reference0.87 Å0.73 Å0.65 Å
  • Sample quality: Across all tasks, CDS produced samples that more faithfully reproduced the target multimodal structure (lower KL, higher ESS).
  • Evaluation budget: For a fixed budget of density evaluations, CDS consistently outperformed PT and diffusion‑based baselines, confirming the theoretical claim that short‑time transport dramatically reduces initialization cost.
  • Ablation: Removing PT (i.e., initializing from the plain reference) caused a steep drop in ESS, highlighting the importance of PT’s global exploration.

Practical Implications

  • Faster Bayesian inference: Practitioners can replace costly MCMC kernels with a short PT warm‑up followed by a deterministic transport, cutting wall‑clock time while preserving posterior fidelity.
  • Generative modeling without training: CDS offers a “plug‑and‑play” sampler for energy‑based models where training a score network is prohibitive (e.g., large scientific simulators).
  • Molecular and material design: Sampling diverse low‑energy conformations often requires many PT swaps; CDS reduces the number of swaps needed, accelerating conformer generation pipelines.
  • Scalable to high dimensions: Because the SDE drift is analytic, the method scales similarly to standard PT; the short diffusion horizon keeps integration cheap even in hundreds of dimensions.

Developers can integrate CDS into existing probabilistic programming frameworks (PyMC, Stan) by exposing a “conditional diffusion sampler” backend that internally handles PT initialization and SDE integration.

Limitations & Future Work

  • Initialization distribution quality: Although the theory guarantees diminishing cost for short (\tau), extremely high‑dimensional or pathological targets may still need a relatively accurate PT initialization, increasing the PT runtime.
  • Choice of diffusion time (\tau): Selecting an optimal (\tau) currently requires heuristic tuning; an adaptive scheme could make CDS more user‑friendly.
  • Non‑Gaussian references: The current derivation assumes a tractable reference (often Gaussian). Extending Conditional Interpolants to more flexible references could broaden applicability.
  • Parallelism: PT remains the bottleneck for massive parallel hardware; future work could explore replica‑exchange variants that better exploit GPUs/TPUs.

Overall, Conditional Diffusion Sampling opens a promising avenue for combining the robustness of classical MCMC with the elegance of diffusion‑based transport—offering developers a practical, low‑overhead tool for tackling hard sampling problems.

Authors

  • Francisco M. Castro-Macías
  • Pablo Morales-Álvarez
  • Saifuddin Syed
  • Daniel Hernández-Lobato
  • Rafael Molina
  • José Miguel Hernández-Lobato

Paper Information

  • arXiv ID: 2605.04013v1
  • Categories: stat.ML, cs.LG
  • Published: May 5, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Normalizing Trajectory Models

Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coar...