[Paper] DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation

Published: 1 month ago (January 6, 2026 at 11:55 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.03178v1

Overview

Diffusion models are the backbone of today’s high‑fidelity image and video generators, but their multi‑step inference pipelines make them painfully slow for production use. The paper DiffBench Meets DiffAgent tackles this bottleneck by marrying two trends:

a systematic benchmark (DiffBench) that measures how well different acceleration tricks work together, and
an LLM‑powered “agent” (DiffAgent) that automatically writes, tests, and refines code to speed up any diffusion model.

The result is a reproducible, end‑to‑end pipeline that can turn a vanilla diffusion model into a production‑ready, low‑latency service with minimal human effort.

Key Contributions

DiffBench: A unified benchmark covering a wide range of diffusion architectures (e.g., UNet, Transformer‑based), hardware back‑ends (GPU, CPU, edge accelerators), and acceleration techniques (pruning, quantization, knowledge distillation, scheduler tweaks). It provides a three‑stage automated evaluation pipeline:
1. code generation,
2. functional correctness testing, and
3. performance profiling.
DiffAgent: An LLM‑driven autonomous agent that iteratively proposes acceleration strategies, generates the corresponding Python/C++ code, runs it, and uses a genetic‑algorithm‑style feedback loop to evolve better solutions. The agent consists of:
- Planner – selects promising technique combinations based on model metadata.
- Code Generator – prompts a large language model (e.g., GPT‑4) to emit implementation snippets.
- Debugger – parses runtime errors and feeds them back to the planner.
- Genetic Optimizer – treats each generated script as an individual, mutates/recombines them, and selects the highest‑throughput candidates.
Closed‑Loop Evaluation: The entire workflow runs without manual intervention, enabling rapid prototyping of acceleration pipelines for new diffusion models.
Empirical Validation: Across 12 diffusion models and 7 hardware setups, DiffAgent consistently outperforms baseline LLM prompts and hand‑crafted acceleration scripts, achieving up to 3.2× speed‑up with < 1 % quality degradation.

Methodology

1. Benchmark Construction (DiffBench)

Curated a dataset of 12 open‑source diffusion models spanning text‑to‑image, video, and super‑resolution tasks.
Implemented wrappers for 9 popular acceleration primitives (e.g., TensorRT INT8, ONNX Runtime, weight pruning).
Defined three evaluation stages:
- Correctness: Verify that the accelerated model produces outputs within a preset PSNR/LPIPS tolerance.
- Performance: Measure latency, throughput, and memory footprint on each target device.
- Robustness: Run a stress test with varied batch sizes and random seeds.

2. Agent Design (DiffAgent)

Planning: The agent extracts model characteristics (layer types, parameter counts) and consults a knowledge base of technique compatibilities.
Code Generation: It crafts a prompt that includes the model’s API, desired speed‑up target, and hardware constraints, then feeds this to an LLM. The LLM returns a self‑contained script (often a mix of PyTorch, TorchScript, and custom CUDA kernels).
Debugging & Feedback: Execution logs are parsed for errors (e.g., missing operators, shape mismatches). The debugger rewrites the prompt with corrective hints.
Genetic Optimization: Each script is treated as a genome; mutation operators randomly toggle techniques (e.g., switch from FP16 to INT8). A fitness function combines latency gain and quality loss. Over several generations, the agent converges on a high‑performing solution.

3. Evaluation Loop

The generated code is automatically compiled, loaded, and benchmarked via DiffBench.
Results are fed back to the genetic optimizer, which decides whether to keep, discard, or mutate the candidate.

Results & Findings

Model (Task)	Baseline Latency (ms)	DiffAgent Latency (ms)	Speed‑up	Quality Δ (LPIPS)
StableDiffusion‑v1.5 (text‑to‑image)	1200	380	3.2×	+0.006
VideoDiffusion‑2 (16‑frame video)	5400	1700	3.2×	+0.009
Real‑ESRGAN (super‑resolution)	850	280	3.0×	+0.004

Higher‑order combos win: The best scripts combined operator fusion + mixed‑precision + kernel‑level pruning.
Genetic feedback matters: Pure LLM prompting without the evolutionary loop plateaued at ~1.5× speed‑up.
Hardware‑aware tuning: On edge GPUs (e.g., Jetson Nano), the agent learned to favor INT8 quantization and aggressive kernel tiling, achieving a 2.4× gain while staying within the device’s memory budget.

Practical Implications

Rapid Deployment: Teams can feed a new diffusion checkpoint into DiffAgent and obtain a production‑ready, optimized inference script in under an hour—dramatically shortening the “research‑to‑product” cycle.
Cost Savings: Faster inference translates directly to lower cloud GPU bills. A 3× speed‑up on a typical Stable Diffusion service can cut monthly compute spend by ~30 %.
Edge AI Enablement: The framework’s hardware‑aware component makes it feasible to run diffusion models on edge devices (mobile, AR/VR headsets) that previously could only host lightweight classifiers.
Standardized Evaluation: DiffBench can serve as a community reference for comparing new acceleration libraries (e.g., NVIDIA’s FasterTransformer, Intel’s OpenVINO) under identical conditions.

Limitations & Future Work

LLM Dependency: The quality of generated code hinges on the underlying LLM; older or smaller models may produce non‑compilable scripts, increasing the debugging burden.
Search Space Explosion: The genetic algorithm explores a combinatorial space of techniques; while effective for the evaluated models, scaling to dozens of techniques may require more sophisticated search heuristics (e.g., reinforcement learning).
Quality Metric Scope: The paper focuses on LPIPS/PSNR; other downstream metrics (e.g., CLIP similarity for text‑to‑image) were not evaluated, which could affect perceived quality in some applications.
Security & Safety: Automatically generated CUDA kernels could inadvertently introduce memory‑safety bugs; future versions should integrate static analysis or sandboxed execution.

Overall, DiffBench and DiffAgent illustrate a compelling direction: using LLMs not just for code completion, but for end‑to‑end system optimization, turning the once‑manual art of diffusion acceleration into an automated, reproducible workflow.

Authors

Jiajun jiao
Haowei Zhu
Puyuan Yang
Jianghui Wang
Ji Liu
Ziqiong Liu
Dong Li
Yuejian Fang
Junhai Yong
Bin Wang
Emad Barsoum

Paper Information

arXiv ID: 2601.03178v1
Categories: cs.CV
Published: January 6, 2026
PDF: Download PDF

[Paper] DiffBench Meets DiffAgent: End-to-End LLM-Driven Diffusion Acceleration Code Generation

Overview

Key Contributions

Methodology

1. Benchmark Construction (DiffBench)

2. Agent Design (DiffAgent)

3. Evaluation Loop

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Deepfake detectors are DUMB: A benchmark to assess adversarial training robustness under transferability constraints

[Paper] Adaptive Conditional Contrast-Agnostic Deformable Image Registration with Uncertainty Estimation

[Paper] VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction

[Paper] WaveRNet: Wavelet-Guided Frequency Learning for Multi-Source Domain-Generalized Retinal Vessel Segmentation

Overview

Key Contributions

Methodology

1. Benchmark Construction (DiffBench)

2. Agent Design (DiffAgent)

3. Evaluation Loop

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Deepfake detectors are DUMB: A benchmark to assess adversarial training robustness under transferability constraints

[Paper] Adaptive Conditional Contrast-Agnostic Deformable Image Registration with Uncertainty Estimation

[Paper] VideoAR: Autoregressive Video Generation via Next-Frame &amp; Scale Prediction

[Paper] WaveRNet: Wavelet-Guided Frequency Learning for Multi-Source Domain-Generalized Retinal Vessel Segmentation

[Paper] VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction