[Paper] FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference

Published: 3 days ago (February 11, 2026 at 01:21 PM EST)

5 min read

Source: arXiv

Source: arXiv - 2602.11105v1

Overview

FastFlow tackles one of the biggest pain points of modern generative flow‑matching models: they produce stunning images and videos, but the sequential denoising steps make inference painfully slow. By turning the inference process into an adaptive, “skip‑a‑few‑steps” problem, FastFlow delivers more than a 2.6× speed‑up without retraining the underlying model, and it works out‑of‑the‑box for image synthesis, video generation, and editing pipelines.

Key Contributions

Plug‑and‑play acceleration – FastFlow can be dropped into any existing flow‑matching model (e.g., FFJORD, FlowMatch) without modifying its weights or training procedure.
Finite‑difference velocity extrapolation – Uses cheap finite‑difference estimates from previously computed velocities to approximate the denoising trajectory for intermediate steps, eliminating the need for a full neural‑network forward pass.
Bandit‑driven step‑skipping – Formulates “how many steps to skip” as a multi‑armed bandit problem; the bandit learns online which skip lengths keep quality high while maximizing speed.
Task‑agnostic generalization – Demonstrated on unconditional image generation, text‑to‑image editing, and video synthesis, showing consistent speed‑quality trade‑offs across modalities.
Open‑source implementation – Full code released (GitHub), enabling reproducibility and easy integration with popular libraries such as PyTorch‑Lightning and Diffusers.

Methodology

Velocity Prediction Baseline – A flow‑matching model predicts a velocity field (v_t(x)) that tells how to move a noisy sample (x_t) toward the clean data point as the time index (t) decreases.
Detect “flat” segments – FastFlow monitors the magnitude of successive velocity updates. When the change between two consecutive predictions falls below a threshold, the step is deemed low‑impact.
Finite‑difference extrapolation – For a low‑impact segment spanning (k) steps, FastFlow approximates the intermediate states using a simple finite‑difference formula:

$$
x_{t-k} \approx x_t - \sum_{i=0}^{k-1} \Delta t_i , \hat{v}_{t-i}
$$

where (\hat{v}_{t-i}) are the previously computed velocities. No neural network evaluation is required for the skipped steps.
Bandit controller – The decision “skip how many steps?” is treated as a multi‑armed bandit with arms = possible skip lengths (e.g., 1, 2, 4, 8). After each generation, a lightweight quality proxy (e.g., a learned perceptual image patch similarity score) provides a reward. The bandit updates its policy to favor skips that give high reward (good quality) while minimizing compute.
Adaptive loop – During inference, FastFlow repeatedly:
(a) runs the full model for the current step,
(b) asks the bandit for a skip length,
(c) extrapolates the next few states, and
(d) repeats until the final timestep is reached.

Results & Findings

Task	Baseline (steps)	FastFlow (effective steps)	Speed‑up	FID / PSNR (quality)
Image synthesis (CIFAR‑10)	1000	~380	2.6×	3.1 % ↓ FID (≈ unchanged)
Text‑guided editing (COCO‑Captions)	800	~300	2.7×	No perceptible drop in CLIP‑Score
Video generation (UCF‑101)	1200	~420	2.8×	0.02 dB PSNR loss (within noise)

The bandit quickly converges (≈ 200 generations) to a stable skip policy that balances speed and fidelity.
Qualitative inspection shows that the extrapolated frames preserve fine‑grained details and temporal consistency, confirming that the finite‑difference approximation is sufficient for “smooth” portions of the trajectory.
FastFlow’s gains are consistent across different model sizes (small 30 M‑parameter to large 200 M‑parameter flow‑matching networks), indicating that the approach scales.

Practical Implications

Faster prototyping – Developers can iterate on generative applications (e.g., UI mock‑ups, video effects) with near‑real‑time feedback, dramatically reducing the latency from minutes to seconds per sample.
Cost reduction in production – Cloud inference costs are directly proportional to GPU time; a 2.6× speed‑up translates to comparable savings for services that serve millions of generated assets daily.
Edge deployment – Since FastFlow skips many heavy forward passes, the memory footprint and compute demand drop, making flow‑matching models viable on consumer‑grade GPUs or even high‑end mobile SoCs.
Compatibility with existing pipelines – No retraining means teams can adopt FastFlow on top of already‑trained diffusion/flow models, preserving their investment in data and fine‑tuning.
Potential for hybrid pipelines – FastFlow’s bandit controller could be combined with other acceleration tricks (e.g., model quantization, early‑exit classifiers) for even larger speed gains.

Limitations & Future Work

Quality proxy dependence – The bandit relies on a fast, differentiable quality estimator; if the proxy mis‑aligns with human perception, the skip policy may become overly aggressive.
Highly non‑linear trajectories – In scenarios with abrupt changes (e.g., sudden scene cuts in video or strong conditioning shifts), the finite‑difference extrapolation can introduce artifacts, requiring more frequent full‑model evaluations.
Bandit warm‑up cost – The first few hundred generations are needed for the bandit to learn an effective policy, which may be a hurdle for one‑off generation tasks.
Future directions suggested by the authors:
1. Learning a more expressive extrapolation model (e.g., lightweight recurrent networks) to handle sharper dynamics.
2. Extending the bandit framework to jointly optimize other resources such as memory bandwidth.
3. Exploring curriculum‑style training where the model is explicitly taught to produce smoother velocity fields that are easier to skip.

FastFlow demonstrates that inference for flow‑matching generative models doesn’t have to be a bottleneck. By turning step‑skipping into an online learning problem, it opens the door for high‑fidelity generation at speeds that are practical for real‑world products.

Authors

Divya Jyoti Bajpai
Dhruv Bhardwaj
Soumya Roy
Tejas Duseja
Harsh Agarwal
Aashay Sandansing
Manjesh Kumar Hanawal

Paper Information

arXiv ID: 2602.11105v1
Categories: cs.CV
Published: February 11, 2026
PDF: Download PDF

[Paper] FastFlow: Accelerating The Generative Flow Matching Models with Bandit Inference

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching

[Paper] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

[Paper] MonarchRT: Efficient Attention for Real-Time Video Generation

[Paper] Energy-Aware Spike Budgeting for Continual Learning in Spiking Neural Networks for Neuromorphic Vision