[Paper] FOCUS: DLLMs Know How to Tame Their Compute Bound
Diffusion Large Language Models (DLLMs) offer a compelling alternative to Auto-Regressive models, but their deployment is constrained by high decoding cost. In ...
Diffusion Large Language Models (DLLMs) offer a compelling alternative to Auto-Regressive models, but their deployment is constrained by high decoding cost. In ...
Astronomical imaging remains noise-limited under practical observing constraints, while standard calibration pipelines mainly remove structured artifacts and le...
Prompt agents have recently emerged as a promising paradigm for automated prompt optimization, framing refinement as a sequential decision-making problem over a...
This paper proposes a novel inverse reinforcement learning framework using a diffusion-based adaptive lookahead planner (IRL-DAL) for autonomous vehicles. Train...
Despite rapid advances in autonomous AI scientists powered by language models, generating publication-ready illustrations remains a labor-intensive bottleneck i...
We introduce a guided stochastic sampling method that augments sampling from diffusion models with physics-based guidance derived from partial differential equa...
The Muon optimizer has demonstrated strong empirical performance in pre-training large language models by performing matrix-level gradient (or momentum) orthogo...
Recent works on language identification and generation have established tight statistical rates at which these tasks can be achieved. These works typically oper...
Software issue resolution in large repositories is a long-range decision process: choices made during localization shape the space of viable edits, and missteps...
Large audio-language models increasingly operate on raw speech inputs, enabling more seamless integration across domains such as voice assistants, education, an...
Repository-level code completion remains challenging for large language models (LLMs) due to cross-file dependencies and limited context windows. Prior work add...
Vision-language models suffer performance degradation under domain shift, limiting real-world applicability. Existing test-time adaptation methods are computati...
Model comparison and calibrated uncertainty quantification often require integrating over parameters, but scalable inference can be challenging for complex, mul...
Vision-language models (VLMs) demonstrate impressive performance on standard video understanding benchmarks yet fail systematically on simple reasoning tasks th...
We propose a variational framework that interprets transformer layers as iterations of an optimization algorithm acting on token embeddings. In this view, self-...
In recent years, large language models (LLMs) have made rapid progress in information retrieval, yet existing research has mainly focused on text or static mult...
Markov decision processes (MDPs) are a fundamental model in sequential decision making. Robust MDPs (RMDPs) extend this framework by allowing uncertainty in tra...
While multiagent systems have shown promise for tackling complex tasks via specialization, finetuning multiple agents simultaneously faces two key challenges: (...
Existing multimodal large language models for long-video understanding predominantly rely on uniform sampling and single-turn inference, limiting their ability ...
Language models (LMs) are trained over sequences of tokens, whereas users interact with LMs via text. This mismatch gives rise to the partial token problem, whi...
While dense pixel-wise annotations remain the gold standard for medical image segmentation, they are costly to obtain and limit scalability. In contrast, many d...
Despite recent Multimodal Large Language Models (MLLMs)' linguistic prowess in medical diagnosis, we find even state-of-the-art MLLMs suffer from a critical per...
Deep search agents powered by large language models have demonstrated strong capabilities in multi-step retrieval, reasoning, and long-horizon task execution. H...
While Chain-of-Thought (CoT) significantly enhances the performance of Large Language Models (LLMs), explicit reasoning chains introduce substantial computation...