[Paper] DySCO: Dynamic Attention-Scaling Decoding for Long-Context LMs
Understanding and reasoning over long contexts is a crucial capability for language models (LMs). Although recent models support increasingly long context windo...
5644 posts from this source
Understanding and reasoning over long contexts is a crucial capability for language models (LMs). Although recent models support increasingly long context windo...
Mixed-Integer Programs (MIPs) are NP-hard optimization models that arise in a broad range of decision-making applications, including finance, logistics, energy ...
Arbitrary-Scale SR (ASISR) remains fundamentally limited by cross-scale distribution shift: once the inference scale leaves the training range, noise, blur, and...
Checkpointing is essential for fault tolerance in training large language models (LLMs). However, existing methods, regardless of their I/O strategies, periodic...
The inability of Large Language Models (LLMs) to modulate their personality expression in response to evolving dialogue dynamics hinders their performance in co...
Most contemporary neural learning systems rely on epoch-based optimization and repeated access to historical data, implicitly assuming reversible computation. I...
Unified conditional image generation remains difficult because different tasks depend on fundamentally different internal representations. Some require conceptu...
Cardiovascular disease (CVD) remains one of the leading global health challenges, accounting for more than 19 million deaths worldwide. To address this, several...
Reinforcement Learning from Human Feedback (RLHF) plays a significant role in aligning Large Language Models (LLMs) with human preferences. While RLHF with expe...
Large Language Models (LLMs) are increasingly used to ``professionalize'' workplace communication, often at the cost of linguistic identity. We introduce 'Cultu...
Object hallucination is a critical issue in Large Vision-Language Models (LVLMs), where outputs include objects that do not appear in the input image. A natural...
Medical vision-language pretraining increasingly relies on medical reports as large-scale supervisory signals; however, raw reports often exhibit substantial st...
Recent advances in Multimodal Large Language Models have greatly improved visual understanding and reasoning, yet their quadratic attention and offline training...
Deep neural networks (DNNs) are essential for performing advanced tasks on edge or mobile devices, yet their deployment is often hindered by severe resource con...
We study the basic task of mean estimation in the presence of mean-shift contamination. In the mean-shift contamination model, an adversary is allowed to replac...
Instruction-following benchmarks remain predominantly English-centric, leaving a critical evaluation gap for the hundreds of millions of Indic language speakers...
Small language models (SLMs) offer compelling advantages in cost, latency, and adaptability, but have so far lagged behind larger models on long-horizon softwar...
Understanding the geometry of learned distributions is fundamental to improving and interpreting diffusion models, yet systematic tools for exploring their land...
Despite the extensive literature on training loss functions, the evaluation of generalization on the validation set remains underexplored. In this work, we cond...
The increasing complexity and diversity of hardware accelerators in modern computing systems demand flexible, low-overhead program analysis tools. We present PA...
This paper explains the Visual Milestone Planning (VMP) method using an agile vocabulary to facilitate its adoption by agile practitioners as a front end for a ...
Integrated Development Environments shape developers' daily experience, yet the empirical study of their usability and user experience (UX) remains limited. Thi...
As the complexity of the HPC storage stack rapidly grows, domain scientists face increasing challenges in effectively utilizing HPC storage systems to achieve t...
Automated test generation is essential for software quality assurance, with coverage rate serving as a key metric to ensure thorough testing. Recent advancement...
The optimization of complex medical appointment scheduling remains a significant operational challenge in multi-center healthcare environments, where clinical s...
In this paper, we investigate a problem of minimizing total energy consumption for secure federated learning (FL) over wireless edge networks. To address the hi...
Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrast...
Differentially private federated learning faces a fundamental tension: privacy protection mechanisms that safeguard client data simultaneously create quantifiab...
Large language models (LLMs) are increasingly used for automated code refactoring tasks. Although these models can quickly refactor code, the quality may exhibi...
LLM agents have been widely adopted in real-world applications, relying on agent frameworks for workflow execution and multi-agent coordination. As these system...
The rapid advancement of large language models (LLMs) has led to a significant increase in automated tools in the software engineering, capable of performing va...
Scaling long-context capabilities is crucial for Multimodal Large Language Models (MLLMs). However, real-world multimodal datasets are extremely heterogeneous. ...
Neural routing solvers (NRSs) that leverage deep learning to tackle vehicle routing problems have demonstrated notable potential for practical applications. By ...
Prototyping plays a critical role in the development of machine learning (ML) solutions, yet existing tools often provide limited support for effective collabor...
Lamport's 1978 paper introduced the happens-before relation and logical clocks, freeing distributed systems from dependence on synchronized physical clocks. Thi...
Large language models (LLMs) for code editing have achieved remarkable progress, yet recent empirical studies reveal a fundamental disconnect between technical ...
Eliminating undefined behaviors (UBs) in Rust programs requires a deep semantic understanding to enable accurate and reliable repair. While existing studies hav...
Choreographies describe distributed protocols from a global viewpoint, enabling correct-by-construction synthesis of local behaviours. We develop a policy-param...
Can an LLM learn how an optimizer behaves -- and use that knowledge to control it? We extend Code World Models (CWMs), LLM-synthesized Python programs that pred...
Test-time training (TTT) with KV binding as sequence modeling layer is commonly interpreted as a form of online meta-learning that memorizes a key-value mapping...
Visual reinforcement learning is appealing for robotics but expensive -- off-policy methods are sample-efficient yet slow; on-policy methods parallelize well bu...
We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in...
We report the performance of Aletheia (Feng et al., 2026b), a mathematics research agent powered by Gemini 3 Deep Think, on the inaugural FirstProof challenge. ...
Embodied LLMs endow robots with high-level task reasoning, but they cannot reflect on what went wrong or why, turning deployment into a sequence of independent ...
Efficiently processing long sequences with Transformer models usually requires splitting the computations across accelerators via context parallelism. The domin...
Cryo-electron tomography (cryo-ET) enables high resolution, three-dimensional reconstruction of biological structures, including membranes and membrane proteins...
Despite rapid recent progress in the terminal capabilities of large language models, the training data strategies behind state-of-the-art terminal agents remain...
We study the complexity of smoothed agnostic learning, recently introduced by~cite{CKKMS24}, in which the learner competes with the best classifier in a target ...