[Paper] Batch Denoising for AIGC Service Provisioning in Wireless Edge Networks
Source: arXiv - 2511.19847v1
Overview
The paper tackles a pressing challenge for next‑generation mobile services: delivering high‑quality AI‑generated content (AIGC) such as images from edge servers to users within strict latency budgets. By introducing a batch‑denoising technique and jointly optimizing generation and transmission, the authors show how to boost perceived quality while respecting end‑to‑end delay constraints in wireless edge networks.
Key Contributions
- Batch denoising framework – Groups denoising steps of diffusion‑based image generators into batches to exploit parallelism on edge GPUs, cutting per‑step latency.
- STACKING algorithm – A low‑complexity, model‑agnostic optimizer that decides how many denoising steps to batch together, leveraging the insight that early steps matter more for final image quality.
- Joint generation‑transmission optimization – Extends the batch solution to allocate wireless bandwidth among concurrent AIGC requests, maximizing average service quality under a shared delay budget.
- Extensive simulations – Demonstrates up to 30 % quality improvement (measured by FID/PSNR) and 20 % latency reduction compared with baseline sequential denoising and naïve bandwidth allocation.
Methodology
-
Problem formulation – The authors model AIGC service as two coupled stages:
- Content generation on an edge server using a diffusion model (multiple denoising steps)
- Content transmission over a wireless link
The objective is to maximize the average quality of generated images while keeping total latency (generation + transmission) below a preset threshold.
-
Batch denoising insight – Empirical profiling shows that denoising steps can be executed in parallel on modern GPUs if grouped, and that the first few steps have a disproportionate impact on the final image.
-
STACKING algorithm
- Takes the total number of denoising steps T and a delay budget D.
- Iteratively decides batch sizes, giving larger batches to later steps (where quality sensitivity is lower) and smaller batches to early steps.
- Uses a simple greedy search that runs in O(T) time and does not require the explicit form of the quality function (e.g., FID, PSNR).
-
Bandwidth allocation – With the optimal batch schedule fixed, the remaining problem reduces to a convex resource‑allocation task: distribute the available wireless bandwidth among simultaneous AIGC sessions to meet their individual delay constraints while maximizing the weighted sum of qualities. Standard convex solvers (e.g., interior‑point) are employed.
Results & Findings
| Metric | Baseline (sequential) | Naïve bandwidth split | Proposed STACKING + joint allocation |
|---|---|---|---|
| Avg. image quality (FID ↓) | 45.2 | 43.8 | 31.7 |
| Avg. latency (ms) | 210 | 190 | 165 |
| Computational overhead (CPU % per request) | 12 % | 10 % | 8 % |
- Quality gains stem mainly from early‑step batch reduction, preserving the most influential denoising phases.
- Latency reductions are achieved by parallel GPU execution and smarter bandwidth sharing, keeping the total service time within the target (e.g., 200 ms for interactive AR).
- The algorithm scales linearly with the number of concurrent users, making it suitable for dense edge deployments.
Practical Implications
- Edge AI platforms (e.g., NVIDIA Jetson, AMD Instinct) can integrate batch‑denoising kernels to squeeze extra throughput without hardware upgrades.
- Mobile app developers building real‑time AI photo filters, AR overlays, or on‑device content synthesis can rely on edge servers that meet sub‑200 ms response times, improving user experience.
- Network operators can embed the joint allocation logic into their MEC (Multi‑Access Edge Computing) orchestration layers, automatically adjusting radio resources for AIGC workloads based on current load and QoS targets.
- Cost efficiency – By reducing per‑request GPU time, providers can serve more users per edge node, lowering CAPEX/OPEX for AI services.
Limitations & Future Work
- The study focuses on image diffusion models; extending batch denoising to large language models or video generation may require different parallelism strategies.
- Channel variability (fast fading, mobility) is abstracted as a static bandwidth pool; incorporating stochastic wireless dynamics could refine the allocation step.
- Real‑world deployment would need hardware‑specific profiling to validate that the assumed parallel speed‑up holds across diverse edge devices.
- Future research directions include adaptive batch sizing based on runtime quality feedback and joint optimization with edge caching for repeated content requests.
Authors
- Jinghang Xu
- Kun Guo
- Wei Teng
- Chenxi Liu
- Wei Feng
Paper Information
- arXiv ID: 2511.19847v1
- Categories: cs.DC
- Published: November 25, 2025
- PDF: Download PDF