[Paper] FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference
Source: arXiv - 2602.11105v1
Overview
FastFlow tackles one of the biggest pain points of modern generative flow‑matching models: they produce stunning images and videos, but the sequential denoising steps make inference painfully slow. By turning the inference process into an adaptive, “skip‑a‑few‑steps” problem, FastFlow delivers more than a 2.6× speed‑up without retraining the underlying model, and it works out‑of‑the‑box for image synthesis, video generation, and editing pipelines.
Key Contributions
- Plug‑and‑play acceleration – FastFlow can be dropped into any existing flow‑matching model (e.g., FFJORD, FlowMatch) without modifying its weights or training procedure.
- Finite‑difference velocity extrapolation – Uses cheap finite‑difference estimates from previously computed velocities to approximate the denoising trajectory for intermediate steps, eliminating the need for a full neural‑network forward pass.
- Bandit‑driven step‑skipping – Formulates “how many steps to skip” as a multi‑armed bandit problem; the bandit learns online which skip lengths keep quality high while maximizing speed.
- Task‑agnostic generalization – Demonstrated on unconditional image generation, text‑to‑image editing, and video synthesis, showing consistent speed‑quality trade‑offs across modalities.
- Open‑source implementation – Full code released (GitHub), enabling reproducibility and easy integration with popular libraries such as PyTorch‑Lightning and Diffusers.
Methodology
-
Velocity Prediction Baseline – A flow‑matching model predicts a velocity field (v_t(x)) that tells how to move a noisy sample (x_t) toward the clean data point as the time index (t) decreases.
-
Detect “flat” segments – FastFlow monitors the magnitude of successive velocity updates. When the change between two consecutive predictions falls below a threshold, the step is deemed low‑impact.
-
Finite‑difference extrapolation – For a low‑impact segment spanning (k) steps, FastFlow approximates the intermediate states using a simple finite‑difference formula:
$$
x_{t-k} \approx x_t - \sum_{i=0}^{k-1} \Delta t_i , \hat{v}_{t-i}
$$where (\hat{v}_{t-i}) are the previously computed velocities. No neural network evaluation is required for the skipped steps.
-
Bandit controller – The decision “skip how many steps?” is treated as a multi‑armed bandit with arms = possible skip lengths (e.g., 1, 2, 4, 8). After each generation, a lightweight quality proxy (e.g., a learned perceptual image patch similarity score) provides a reward. The bandit updates its policy to favor skips that give high reward (good quality) while minimizing compute.
-
Adaptive loop – During inference, FastFlow repeatedly:
(a) runs the full model for the current step,
(b) asks the bandit for a skip length,
(c) extrapolates the next few states, and
(d) repeats until the final timestep is reached.
Results & Findings
| Task | Baseline (steps) | FastFlow (effective steps) | Speed‑up | FID / PSNR (quality) |
|---|---|---|---|---|
| Image synthesis (CIFAR‑10) | 1000 | ~380 | 2.6× | 3.1 % ↓ FID (≈ unchanged) |
| Text‑guided editing (COCO‑Captions) | 800 | ~300 | 2.7× | No perceptible drop in CLIP‑Score |
| Video generation (UCF‑101) | 1200 | ~420 | 2.8× | 0.02 dB PSNR loss (within noise) |
- The bandit quickly converges (≈ 200 generations) to a stable skip policy that balances speed and fidelity.
- Qualitative inspection shows that the extrapolated frames preserve fine‑grained details and temporal consistency, confirming that the finite‑difference approximation is sufficient for “smooth” portions of the trajectory.
- FastFlow’s gains are consistent across different model sizes (small 30 M‑parameter to large 200 M‑parameter flow‑matching networks), indicating that the approach scales.
Practical Implications
- Faster prototyping – Developers can iterate on generative applications (e.g., UI mock‑ups, video effects) with near‑real‑time feedback, dramatically reducing the latency from minutes to seconds per sample.
- Cost reduction in production – Cloud inference costs are directly proportional to GPU time; a 2.6× speed‑up translates to comparable savings for services that serve millions of generated assets daily.
- Edge deployment – Since FastFlow skips many heavy forward passes, the memory footprint and compute demand drop, making flow‑matching models viable on consumer‑grade GPUs or even high‑end mobile SoCs.
- Compatibility with existing pipelines – No retraining means teams can adopt FastFlow on top of already‑trained diffusion/flow models, preserving their investment in data and fine‑tuning.
- Potential for hybrid pipelines – FastFlow’s bandit controller could be combined with other acceleration tricks (e.g., model quantization, early‑exit classifiers) for even larger speed gains.
Limitations & Future Work
- Quality proxy dependence – The bandit relies on a fast, differentiable quality estimator; if the proxy mis‑aligns with human perception, the skip policy may become overly aggressive.
- Highly non‑linear trajectories – In scenarios with abrupt changes (e.g., sudden scene cuts in video or strong conditioning shifts), the finite‑difference extrapolation can introduce artifacts, requiring more frequent full‑model evaluations.
- Bandit warm‑up cost – The first few hundred generations are needed for the bandit to learn an effective policy, which may be a hurdle for one‑off generation tasks.
- Future directions suggested by the authors:
- Learning a more expressive extrapolation model (e.g., lightweight recurrent networks) to handle sharper dynamics.
- Extending the bandit framework to jointly optimize other resources such as memory bandwidth.
- Exploring curriculum‑style training where the model is explicitly taught to produce smoother velocity fields that are easier to skip.
FastFlow demonstrates that inference for flow‑matching generative models doesn’t have to be a bottleneck. By turning step‑skipping into an online learning problem, it opens the door for high‑fidelity generation at speeds that are practical for real‑world products.
Authors
- Divya Jyoti Bajpai
- Dhruv Bhardwaj
- Soumya Roy
- Tejas Duseja
- Harsh Agarwal
- Aashay Sandansing
- Manjesh Kumar Hanawal
Paper Information
- arXiv ID: 2602.11105v1
- Categories: cs.CV
- Published: February 11, 2026
- PDF: Download PDF