[Paper] Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?
Diffusion Language Models (DLMs) are often advertised as enabling parallel token generation, yet practical fast DLMs frequently converge to left-to-right, autor...