[Paper] Communication-Efficient Serving for Video Diffusion Models with Latent Parallelism
Video diffusion models (VDMs) perform attention computation over the 3D spatio-temporal domain. Compared to large language models (LLMs) processing 1D sequences...