[Paper] Native and Compact Structured Latents for 3D Generation
Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, w...
Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, w...
Video foundation models generate visually realistic and temporally coherent content, but their reliability as world simulators depends on whether they capture p...
Recent progress in humanoid robots has unlocked agile locomotion skills, including backflipping, running, and crawling. Yet it remains challenging for a humanoi...
Recent audio language models can follow long conversations. However, research on emotion-aware or spoken dialogue summarization is constrained by the lack of da...
Stochastic optimization is fundamental to modern machine learning. Recent research has extended the study of stochastic first-order methods (SFOMs) from light-t...
Hospitals lack automated systems to harness the growing volume of heterogeneous clinical and operational data to effectively forecast critical events. Early ide...
Multi-token generation has emerged as a promising paradigm for accelerating transformer-based large model inference. Recent efforts primarily explore diffusion ...
We propose VASA-3D, an audio-driven, single-shot 3D head avatar generator. This research tackles two major challenges: capturing the subtle expression details p...
Contemporary reservoir computing relies heavily on smooth, globally Lipschitz continuous activation functions, limiting applications in defense, disaster respon...
LLM based chatbots have become central interfaces in technical, educational, and analytical domains, supporting tasks such as code reasoning, problem solving, a...
We introduce ART, Articulated Reconstruction Transformer -- a category-agnostic, feed-forward model that reconstructs complete 3D articulated objects from only ...
Achieving truly adaptive embodied intelligence requires agents that learn not just by imitating static demonstrations, but by continuously improving through env...
Visual Sentiment Analysis (VSA) is a challenging task due to the vast diversity of emotionally salient images and the inherent difficulty of acquiring sufficien...
We introduce gridfm-datakit-v1, a Python library for generating realistic and diverse Power Flow (PF) and Optimal Power Flow (OPF) datasets for training Machine...
We address the fundamental incompatibility of attention-based encoder-decoder (AED) models with long-form acoustic encodings. AED models trained on segmented ut...
Today, a lot of research on language models is focused on large, general-purpose models. However, many NLP pipelines only require models with a well-defined, sm...
Timely and accurate lymphoma diagnosis is essential for guiding cancer treatment. Standard diagnostic practice combines hematoxylin and eosin (HE)-stained whole...
Music editing plays a vital role in modern music production, with applications in film, broadcasting, and game development. Recent advances in music generation ...
Inter-node communication bandwidth increasingly constrains distributed training at scale on multi-node GPU clusters. While compact models are the ultimate deplo...
This paper introduces JMMMU-Pro, an image-based Japanese Multi-discipline Multimodal Understanding Benchmark, and Vibe Benchmark Construction, a scalable constr...
Graph Transformers (GTs) have emerged as a promising graph learning tool, leveraging their all-pair connected property to effectively capture global information...
Many practical decision-making problems involve tasks whose success depends on the entire system history, rather than on achieving a state with desired properti...
The integration of cloud computing and the Internet of Things (IoT) is essential for scalable, intelligent systems. However, developing cloud-of-things (CoT) ap...
Nepali, a low-resource language spoken by over 32 million people, continues to face challenges in natural language processing (NLP) due to its complex grammar, ...