[Paper] SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

Published: 3 days ago (February 27, 2026 at 12:36 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.24208v1

Overview

Diffusion models have become the go‑to technique for high‑fidelity video generation, but their inference cost is still prohibitive because they require hundreds of sequential denoising steps. SenCache introduces a principled, sensitivity‑driven caching strategy that decides when and what intermediate results to reuse, cutting down computation without sacrificing visual quality.

Key Contributions

Sensitivity‑aware error analysis: Derives a formal link between a model’s output sensitivity (to noisy latents and timesteps) and the error introduced by caching.
Dynamic per‑sample caching policy (SenCache): Uses the sensitivity metric to pick cache/reuse timesteps on the fly, rather than relying on static, hand‑tuned heuristics.
Theoretical justification for existing heuristics: Shows why earlier rule‑based methods sometimes work and how they can be systematically improved.
Empirical validation on three state‑of‑the‑art video diffusion models (Wan 2.1, CogVideoX, LTX‑Video): Demonstrates superior visual quality at comparable FLOP budgets.

Methodology

Model‑output sensitivity definition – For a diffusion step, the authors treat the denoising function f as a mapping from the noisy latent zₜ and timestep t to the next latent. They compute the gradient of f w.r.t. both inputs, yielding a scalar sensitivity score S(zₜ, t) that quantifies how much a small perturbation would change the output.
Caching error bound – By linearizing f around the cached point, they prove that the expected error when reusing a cached output grows proportionally to S(zₜ, t).
Adaptive selection rule – During inference, SenCache evaluates S for the current step. If the score is below a user‑defined threshold, the step is skipped and the cached output from the nearest earlier step is reused; otherwise, the model is executed normally and the result is stored for future reuse.
Per‑sample decision making – Because S is computed for each video sample, the caching schedule naturally adapts to content complexity (e.g., fast motion vs. static scenes).
Implementation details – The sensitivity computation adds negligible overhead (a few extra matrix‑vector products) and can be fused with existing inference pipelines.

Results & Findings

Model	Baseline (full steps)	Prior caching (heuristic)	SenCache
Wan 2.1	30.2 dB PSNR	28.7 dB (‑15 % FLOPs)	29.4 dB (‑15 % FLOPs)
CogVideoX	28.9 dB	27.5 dB (‑12 % FLOPs)	28.3 dB (‑12 % FLOPs)
LTX‑Video	31.0 dB	29.8 dB (‑18 % FLOPs)	30.5 dB (‑18 % FLOPs)

Visual quality: User studies reported a 22 % higher preference for SenCache outputs over prior caching methods at the same speed‑up.
Computation savings: FLOP reduction matches that of the best heuristic methods; the extra sensitivity check costs < 1 % of total inference time.
Robustness: The adaptive policy automatically reduces caching for high‑motion clips where sensitivity is high, preventing noticeable artifacts.

Practical Implications

Faster video generation services: Cloud providers can shave off up to 15 % of GPU time per video without noticeable quality loss, translating to lower operational costs.
Edge deployment: Mobile or embedded devices with limited compute can run diffusion‑based video synthesis in near‑real time by aggressively caching low‑sensitivity steps.
Tooling integration: SenCache’s sensitivity metric can be exposed as a simple API (should_cache(step, latent, t)), making it easy to plug into existing diffusion libraries (e.g., Diffusers, OpenAI’s video‑gen SDK).
Dynamic quality‑vs‑speed trade‑off: Developers can tune the sensitivity threshold at runtime to meet latency SLAs, offering a graceful degradation path rather than a binary “full vs. fast” switch.

Limitations & Future Work

Sensitivity threshold selection still requires a small validation sweep; fully automated threshold learning (e.g., via reinforcement learning) is an open direction.
The current analysis assumes locally linear behavior of the denoiser; highly non‑linear regions (e.g., abrupt scene cuts) may still incur larger caching errors.
Experiments focus on three video diffusion models; extending the study to image diffusion, text‑to‑video, or multimodal pipelines would strengthen generality.
Integration with training‑aware acceleration (e.g., distillation) could yield even larger speed‑ups, a promising avenue for follow‑up work.

Authors

Yasaman Haghighi
Alexandre Alahi

Paper Information

arXiv ID: 2602.24208v1
Categories: cs.CV, cs.LG
Published: February 27, 2026
PDF: Download PDF

[Paper] SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Mode Seeking meets Mean Seeking for Fast Long Video Generation

[Paper] Compositional Generalization Requires Linear, Orthogonal Representations in Vision Embedding Models

[Paper] Histopathology Image Normalization via Latent Manifold Compaction

[Paper] MuViT: Multi-Resolution Vision Transformers for Learning Across Scales in Microscopy