[Paper] RELIC: Interactive Video World Model with Long-Horizon Memory
A truly interactive world model requires three key ingredients: real-time long-horizon streaming, consistent spatial memory, and precise user control. However, ...
A truly interactive world model requires three key ingredients: real-time long-horizon streaming, consistent spatial memory, and precise user control. However, ...
This thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying gener...
Why do state-of-the-art OOD detection methods exhibit catastrophic failure when models are trained on single-domain datasets? We provide the first theoretical e...
We present Jina-VLM, a 2.4B parameter vision-language model that achieves state-of-the-art multilingual visual question answering among open 2B-scale VLMs. The ...
This work investigates whether large language models (LLMs) offer advantages over traditional neural networks for astronomical data processing, in regimes with ...
Attention mechanisms are the core of foundation models, but their quadratic complexity remains a critical bottleneck for scaling. This challenge has driven the ...
Quantum key distribution (QKD) security fundamentally relies on the ability to distinguish genuine quantum correlations from classical eavesdropper simulations,...
Training with differential privacy (DP) provides a guarantee to members in a dataset that they cannot be identified by users of the released model. However, tho...
Sketches are simple human hand-drawn abstractions of complex scenes and real-world objects. Although the field of sketch representation learning has advanced si...
Tokenizer adaptation plays an important role in transferring pre-trained language models to new domains or languages. In this work, we address two complementary...
While recent developments in large language models have improved bias detection and classification, sensitive subjects like religion still present challenges be...
Mixture-of-Experts (MoE), while offering significant advantages as a Large Language Model (LLM) architecture, faces substantial challenges when deployed on low-...