[Paper] Improved Mean Flows: On the Challenges of Fastforward Generative Models
MeanFlow (MF) has recently been established as a framework for one-step generative modeling. However, its ``fastforward'' nature introduces key challenges in bo...
MeanFlow (MF) has recently been established as a framework for one-step generative modeling. However, its ``fastforward'' nature introduces key challenges in bo...
The field of 360-degree omnidirectional understanding has been receiving increasing attention for advancing spatial intelligence. However, the lack of large-sca...
Multi-view camera systems enable rich observations of complex real-world scenes, and understanding dynamic objects in multi-view settings has become central to ...
We introduce Audio-Visual Affordance Grounding (AV-AG), a new task that segments object interaction regions from action sounds. Unlike existing approaches that ...
Autonomous driving policies are typically trained via open-loop behavior cloning of human demonstrations. However, such policies suffer from covariate shift whe...
GUI grounding aims to align natural language instructions with precise regions in complex user interfaces. Advanced multimodal large language models show strong...
Handling static images that lack inherent temporal dynamics remains a fundamental challenge for spiking neural networks (SNNs). In directly trained SNNs, static...
Reasoning over dynamic visual content remains a central challenge for multimodal large language models. Recent thinking models generate explicit reasoning trace...
Recent multimodal large language models (MLLMs) have advanced video understanding, yet most still 'think about videos' ie once a video is encoded, reasoning unf...
Recently, multi-person video generation has started to gain prominence. While a few preliminary works have explored audio-driven multi-person talking video gene...
Large Vision Language Models (VLMs) effectively bridge the modality gap through extensive pretraining, acquiring sophisticated visual representations aligned wi...
Deep learning approaches to object detection have achieved reliable detection of specific object classes in images. However, extending a model's detection capab...
Inverse heat problems refer to the estimation of material thermophysical properties given observed or known heat diffusion behaviour. Inverse heat problems have...
Recent advances in generative world models have enabled remarkable progress in creating open-ended game environments, evolving from static scene synthesis towar...
Recent advances in text-to-video (T2V) and image-to-video (I2V) models, have enabled the creation of visually compelling and dynamic videos from simple textual ...
Underwater object tracking is challenging due to wavelength dependent attenuation and scattering, which severely distort appearance across depths and water cond...
Unifying multimodal understanding, generation and reconstruction representation in a single tokenizer remains a key challenge in building unified models. Previo...
Modern large language models become multimodal, analyzing various data formats like text and images. While fine-tuning is effective for adapting these multimoda...
Large-scale Vision Language Models (LVLMs) exhibit advanced capabilities in tasks that require visual information, including object detection. These capabilitie...
While modern diffusion models excel at generating high-quality and diverse images, they still struggle with high-fidelity compositional and multimodal control, ...
Learning new robot tasks on new platforms and in new scenes from only a handful of demonstrations remains challenging. While videos of other embodiments - human...
Vision-Language Models (VLMs) still lack robustness in spatial intelligence, demonstrating poor performance on spatial understanding and reasoning tasks. We att...
Can one perceive a video's content without seeing its pixels, just from the camera trajectory-the path it carves through space? This paper is the first to syste...
Gliomas are brain tumor types that have a high mortality rate which means early and accurate diagnosis is important for therapeutic intervention for the tumors....