[Paper] DirectAudioEdit: Inversion-Free Text-Guided Audio Editing via Diffusion Prediction Contrast
Source: arXiv - 2606.07356v1
Overview
Text-guided audio editing aims to modify the language-specified acoustic content while preserving edit-irrelevant source components. Existing training-free methods typically rely on inversion-based editing. While inversion-free editing is appealing as it decreases computational overhead and reconstruction errors, it remains largely unexplored for audio editing. The key challenge is to construct a source-to-target editing path through diffusion denoising dynamics. In this paper, we introduce DirectAudioEdit, the first attempt to develop a training-free and inversion-free method for audio editing. Experiments on music and event-level benchmarks across two backbones show that DirectAudioEdit reduces macro-averaged FAD and KL by 15.9% and 15.8% compared with DDPM inversion, while achieving up to 64.5% editing speedup.
Key Contributions
This paper presents research in the following areas:
- cs.SD
- cs.CL
Methodology
Please refer to the full paper for detailed methodology.
Practical Implications
This research contributes to the advancement of cs.SD.
Authors
- Zhengkun Ge
- Xiaoqian Liu
- Haoran Zhang
- Yuan Ge
- Junxiang Zhang
- Zhengtao Yu
- Jingbo Zhu
- Tong Xiao
Paper Information
- arXiv ID: 2606.07356v1
- Categories: cs.SD, cs.CL
- Published: June 5, 2026
- PDF: Download PDF