[Paper] Colored Noise Diffusion Sampling

Published: (May 28, 2026 at 01:58 PM EDT)
4 min read
Source: arXiv

Source: arXiv - 2605.30332v1

Overview

Diffusion models have become the go‑to technique for high‑quality image synthesis, but the standard stochastic solvers they rely on treat every timestep the same—injecting uniform white noise regardless of what the model has already generated. The paper Colored Noise Diffusion Sampling proposes a new inference‑time sampler, Colored Noise Sampling (CNS), that dynamically shapes the noise spectrum to match the model’s “spectral bias” (low‑frequency structures appear early, fine details later). By reallocating the limited noise energy to the frequencies that still need work, CNS yields noticeably better image quality without any retraining.

Key Contributions

  • Frequency‑aware sampling framework: Formalizes diffusion inference as a frequency‑decoupled energy transfer problem, showing why uniform white noise is sub‑optimal.
  • Colored Noise Sampling (CNS): A training‑free stochastic solver that injects timestep‑ and frequency‑dependent noise, automatically focusing energy on unresolved spectral bands.
  • Plug‑and‑play compatibility: CNS works as a drop‑in replacement for existing ODE/SDE samplers across multiple diffusion architectures (SiT, JiT, FLUX) with no changes to the underlying model or training pipeline.
  • Empirical gains: On ImageNet‑256, CNS reduces unguided FID by 24‑30 % (e.g., SiT‑XL/2: 8.26 → 6.27) and consistently improves results under classifier‑free guidance.
  • Open‑source release: Code, pretrained checkpoints, and visual demos are provided on a project page, encouraging rapid adoption.

Methodology

  1. Spectral analysis of diffusion trajectories – The authors first demonstrate that diffusion models resolve image frequencies in a predictable order: coarse, low‑frequency components appear early, while high‑frequency details emerge only near the end of the reverse diffusion.
  2. Energy budget reinterpretation – The total stochastic energy injected over the whole trajectory is finite. Instead of spreading it uniformly (white noise), CNS treats each frequency band as a separate “bucket” that receives just enough energy to close the gap between the current estimate and the target distribution.
  3. Design of the colored‑noise schedule
    • At each timestep t, CNS computes a spectral mask that estimates which frequencies are still under‑resolved.
    • It then draws noise from a Gaussian with a covariance that is larger for those frequencies and smaller for already‑resolved ones.
    • The schedule is derived analytically from the diffusion SDE, so no extra hyper‑parameter tuning is required.
  4. Implementation as a sampler – CNS replaces the standard sigma * torch.randn_like(x) call in existing solvers with a frequency‑filtered version. Because the operation is linear and fully differentiable, it can be inserted into any PyTorch‑based diffusion pipeline with a single line of code.

Results & Findings

Model (ImageNet‑256)Baseline FID (unguided)CNS FID (unguided)Relative ↓
SiT‑XL/28.266.2724 %
JiT‑B/1632.3926.6918 %
JiT‑H/1611.888.3130 %
  • Guided sampling: When paired with classifier‑free guidance, CNS still yields consistent FID improvements (e.g., SiT‑XL/2 drops from 6.84 to 5.12).
  • Visual quality: Sampled images show sharper edges and more faithful textures, especially in regions that traditionally suffer from blurriness (hair, foliage).
  • Computation overhead: CNS adds < 5 % runtime compared to vanilla SDE solvers because the extra spectral filtering is implemented via fast FFT‑based operations.

Practical Implications

  • Better out‑of‑the‑box performance: Developers can upgrade existing diffusion‑based generators (e.g., for content creation, data augmentation, or style transfer) simply by swapping the sampler, gaining higher fidelity without retraining.
  • Resource‑efficient generation: Since CNS allocates noise more intelligently, fewer diffusion steps may be needed to reach a target quality, potentially cutting inference latency and GPU usage.
  • Compatibility with downstream pipelines: The method works with both unconditional and guided generation, making it suitable for text‑to‑image, inpainting, and super‑resolution workflows that already rely on classifier‑free guidance.
  • Open‑source tooling: The provided codebase includes ready‑made wrappers for popular libraries (Diffusers, OpenAI‑CLIP‑guided pipelines), lowering the barrier for integration into production services.

Limitations & Future Work

  • Spectral estimation heuristics: CNS relies on a simple proxy to decide which frequencies are unresolved; more sophisticated, data‑driven estimators could further improve allocation.
  • Assumption of isotropic diffusion: The current formulation assumes the standard isotropic diffusion SDE; extending the approach to anisotropic or latent‑space diffusion models remains an open question.
  • Evaluation scope: Experiments focus on ImageNet‑256; testing on higher‑resolution datasets (e.g., 1024×1024) and non‑visual modalities (audio, video) would validate the generality of the method.
  • Theoretical guarantees: While empirical results are strong, a formal proof of convergence or optimality of the colored‑noise schedule is not provided. Future work could aim to bound the sampling error analytically.

Authors

  • Hadar Davidson
  • Noam Issachar
  • Sagie Benaim

Paper Information

  • arXiv ID: 2605.30332v1
  • Categories: cs.CV
  • Published: May 28, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »