[Paper] UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation
Recent video generation models demonstrate impressive synthesis capabilities but remain limited by single-modality conditioning, constraining their holistic wor...