[Paper] Native and Compact Structured Latents for 3D Generation
Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, w...
Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, w...
Video foundation models generate visually realistic and temporally coherent content, but their reliability as world simulators depends on whether they capture p...
We propose VASA-3D, an audio-driven, single-shot 3D head avatar generator. This research tackles two major challenges: capturing the subtle expression details p...
We introduce ART, Articulated Reconstruction Transformer -- a category-agnostic, feed-forward model that reconstructs complete 3D articulated objects from only ...
Achieving truly adaptive embodied intelligence requires agents that learn not just by imitating static demonstrations, but by continuously improving through env...
Visual Sentiment Analysis (VSA) is a challenging task due to the vast diversity of emotionally salient images and the inherent difficulty of acquiring sufficien...
Timely and accurate lymphoma diagnosis is essential for guiding cancer treatment. Standard diagnostic practice combines hematoxylin and eosin (HE)-stained whole...
This paper introduces JMMMU-Pro, an image-based Japanese Multi-discipline Multimodal Understanding Benchmark, and Vibe Benchmark Construction, a scalable constr...
Article URL: https://alpr.watch/ Comments URL: https://news.ycombinator.com/item?id=46290916 Points: 224 Comments: 114...
Fresh off releasing the latest version of its Olmo foundation model, the Allen Institute for AI Ai2 launched its open-source video model, Molmo 2, on Tuesday, a...
AlphaFlow provides a smoother training schedule for MeanFlow image models, reducing the conflict between its two objectives and accelerating learning. Overview...
Video diffusion models have revolutionized generative video synthesis, but they are imprecise, slow, and can be opaque during generation -- keeping users in the...