[Paper] Flow Sampling: Learning to Sample from Unnormalized Densities via Denoising Conditional Processes

Published: (May 5, 2026 at 01:07 PM EDT)
5 min read
Source: arXiv

Source: arXiv - 2605.03984v1

Overview

The paper introduces Flow Sampling, a new framework that teaches a neural network to draw samples from unnormalized probability densities—distributions defined only by an energy (or score) function, without any training data. By marrying diffusion models with flow‑matching ideas, the authors achieve fast, data‑free sampling that works even on curved spaces such as spheres and hyperbolic manifolds.

Key Contributions

  • Data‑free diffusion training: A conditional objective that regresses onto a denoising drift derived directly from the target energy, rather than the usual “noising” drift that depends on data samples.
  • Interpolant process: A clever technique that dramatically reduces the number of expensive energy‑function evaluations needed during training.
  • Riemannian extension: Closed‑form conditional drifts for constant‑curvature manifolds (e.g., hyperspheres, hyperbolic spaces), enabling diffusion‑based sampling beyond Euclidean space.
  • Scalable implementation: Demonstrated on synthetic benchmarks, peptide conformer generation, large‑scale amortized molecular conformer generation, and spherical distributions, achieving state‑of‑the‑art results.
  • Amortized sampler: Once trained, the model can generate many independent samples at inference time with only a handful of neural network evaluations.

Methodology

  1. Problem setup – We are given an energy function (E(x)) that defines an unnormalized density (\tilde{p}(x) \propto e^{-E(x)}). The goal is to produce samples from the normalized distribution (p(x)) without ever computing the normalizing constant.

  2. Diffusion backbone – Standard diffusion models learn a reverse stochastic differential equation (SDE) that denoises a noisy data point back to the data manifold. Flow Sampling flips the conditioning: the model receives a noise sample (z) and learns to predict the denoising drift that would move a particle from the noisy state toward a high‑probability region of (\tilde{p}).

  3. Conditional drift regression – The training loss is
    [ \mathcal{L} = \mathbb{E}{t\sim[0,1],,z\sim\mathcal{N}(0,I)}\big|,\mathbf{v}\theta(t,z) - \underbrace{\big[-\nabla_x E(x_t) + \text{diffusion term}\big]}{\text{target drift}}\big|^2, ]
    where (x_t) is an interpolated state between a reference point (often the origin) and the unknown target sample, and (\mathbf{v}
    \theta) is the neural network’s estimate of the drift.

  4. Interpolant process – Instead of repeatedly evaluating (E(\cdot)) for every diffusion step, the authors construct a linear (or geodesic) interpolation between a known anchor and a random noise sample. This yields a single energy evaluation per training example, cutting cost by an order of magnitude.

  5. Riemannian manifolds – On a manifold with constant curvature (K), the interpolant follows a geodesic curve. The authors derive a closed‑form expression for the conditional drift that respects the manifold’s metric, allowing the same training pipeline to work on spheres ((K>0)) or hyperbolic spaces ((K<0)).

  6. Inference – At test time, the learned drift field is integrated (e.g., with Euler‑Maruyama) from a noise sample to produce a final draw from the target distribution. Because the drift is amortized, generating thousands of samples is cheap.

Results & Findings

BenchmarkMetric (lower is better)Flow Sampling vs. Baselines
2‑D synthetic energy landscapesKL divergence0.12 (ours) vs. 0.31 (Langevin)
Small peptides (10‑20 atoms)RMSD to reference conformers0.78 Å vs. 1.12 Å (MCMC)
Large‑scale molecular conformer generation (10k molecules)Coverage @ 0.5 Å92 % vs. 81 % (Diffusion‑only)
Sampling on (\mathbb{S}^2)Spherical Wasserstein distance0.045 vs. 0.089 (Riemannian HMC)

Takeaway: Flow Sampling matches or exceeds traditional MCMC and diffusion‑only samplers while requiring 10‑100× fewer energy evaluations during training and orders of magnitude less compute at inference.

Practical Implications

  • Molecular design pipelines – Researchers can plug Flow Sampling into existing generative chemistry stacks to generate realistic conformers on‑the‑fly, dramatically reducing the time spent on costly energy minimizations.
  • Physics‑informed simulation – Engineers modeling fluids, materials, or robotics can define custom energy functions (e.g., constraints, potentials) and obtain fast samplers without hand‑crafting MCMC kernels.
  • Geometric deep learning – Tasks that live on manifolds (e.g., directional data, graph embeddings on hyperbolic space) can now use diffusion‑style generative models without leaving the curved space, preserving intrinsic geometry.
  • Amortized inference for Bayesian models – When the posterior is known only up to an unnormalized density, Flow Sampling offers a ready‑to‑deploy amortized sampler that sidesteps repeated costly gradient evaluations.

For developers, the method is implemented as a standard PyTorch module (the authors release code), requiring only the energy function and a few hyper‑parameters (noise schedule, integration steps). It can be dropped into existing pipelines with minimal refactoring.

Limitations & Future Work

  • Energy evaluation cost remains the bottleneck for extremely high‑dimensional systems (e.g., large proteins); while the interpolant reduces per‑sample cost, the absolute number of evaluations can still be high.
  • Choice of noise schedule and integration step size are still somewhat heuristic; sub‑optimal settings can degrade sample quality or stability.
  • The current theory assumes smooth energy functions; non‑differentiable constraints (e.g., hard steric clashes) need additional handling.
  • Future directions mentioned by the authors include: adaptive step‑size schemes, coupling Flow Sampling with learned surrogate energy models, and extending the Riemannian formulation to manifolds with varying curvature (e.g., learned latent manifolds).

Authors

  • Aaron Havens
  • Brian Karrer
  • Neta Shaul

Paper Information

  • arXiv ID: 2605.03984v1
  • Categories: cs.LG, cs.AI
  • Published: May 5, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »

[Paper] Normalizing Trajectory Models

Diffusion-based models decompose sampling into many small Gaussian denoising steps -- an assumption that breaks down when generation is compressed to a few coar...