[Paper] From Model Scaling to System Scaling: Scaling the Harness in Agentic AI
This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiab...
This paper studies the next major bottleneck in agentic AI as system scaling, not only model scaling: the design of auditable, persistent, modular, and verifiab...
Subject-driven image generation aims to synthesize new images that preserve the identity of the given subject while following textual instructions. Existing app...
Multimodal Large Language Models (MLLMs) achieve versatility by reformulating diverse tasks into a unified instruction-following framework via instruction tunin...
Current video-to-4D methods struggle with complex topology changes, transparent materials, thin structures, and inner surfaces. We present Helix4D, a dynamic me...
Recent advances in few-step diffusion distillation have enabled efficient image generation, yet aligning these models with human preferences remains challenging...
Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models for language modeling, yet the effective design of transformer a...
Autoregressive video generators are attractive for streaming, long-horizon, and interactive applications, but distilling strong black-box teachers into causal s...
Fine-tuning MLLMs for Video Temporal Grounding (VTG) often improves in-domain performance but degrades sharply under domain shift. In this work, we find that th...
Structure-from-Motion -- the process of simultaneously estimating camera poses and 3D scene structure from a collection of images -- remains a central challenge...
In this paper, we introduce InstructSAM, a unified and streamlined framework designed for multi-instance segmentation under arbitrary instructions. We formulate...
Code review is a critical practice in software engineering, yet the growing scale and frequency of code patches in modern projects, together with the widespread...
Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To h...
Models trained on a new task typically degrade on prior tasks, a phenomenon known as forgetting. Traditionally, mitigating forgetting has required replaying sto...
Bayesian optimal experimental design (BOED) selects experiments to maximize information gain about model parameters. However, in decision-critical settings, red...
The deployment of Large Language Models (LLMs) and Vision Transformers (ViTs) on edge devices is significantly constrained by memory limitations and the critica...
We present Channel-wise Vector Quantization (CVQ), a novel image tokenization paradigm that replaces patch-wise tokens with channel-wise tokens. Unlike conventi...
Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established scienc...
Large language model agents are increasingly envisioned as always-on personal assistants with access to anything relevant in the user's digital world. Yet curre...
Deep research agents face vast, interdependent, and pervasively uncertain information. Existing systems explore what evolving intermediate representations shoul...
Modern AI benchmarks operate at a complexity that outpaces traditional verification methods. Tasks authored by domain experts often contain implicit assumptions...
Wasserstein policy gradient (WPG) is a policy optimization method for reinforcement learning (RL) that exploits the optimal-transport geometry of action distrib...
Existing financial NLP benchmarks often rely on labels supplied by outside observers, measuring how language is perceived rather than what speakers have committ...
Efficient learning of user preferences is crucial for many modern decision making systems but typically requires costly labeled data. Active learning reduces th...
Annotating speaker attributes from text is inherently ambiguous, particularly in multilingual settings where demographic and social cues are implicit and cultur...
Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient ...
Activation oracles aim to make the activations of other models legible to humans and yield promising results compared to white-box interpretability techniques. ...
We test the standard RLVR tool-use recipe -- GRPO on Qwen2.5-7B-Instruct -- on a deliberately minimal knowledge-graph tool API: four Freebase navigation verbs o...
We introduce CausaLab, a scalable environment for evaluating interactive causal discovery by LLM agents. Unlike prior evaluations, CausaLab evaluates both wheth...
Generating code from natural-language requirements has become a primary route for LLM-assisted software development. Although LLMs can successfully complete sma...
Modern atomistic spin simulations combine long stochastic trajectories, thermodynamic sampling, static optimization and multi-image transition-path workflows, a...
Log parsing is a fundamental step for automated log analysis, which transforms raw log messages into structured formats. Existing syntax-based parsers struggle ...
Generative AI tools are rapidly transforming software development practice, prompting unprecedented research interest. However, existing studies have predominan...
Federated edge learning (FEEL) has recently emerged as a promising paradigm for achieving edge intelligence (EI) via enabling collaborative model training acros...
Agentic AI coding assistants can edit files, run commands, and access the internet on behalf of developers. However, their reliance on unvetted external artifac...
Validators on generic Proof of Stake chains earn the same fees whether they handle attestation work correctly or selectively censor it. For chains whose main ac...
Dynamic multi-objective optimization with a changing number of objectives has recently attracted increasing attention due to its relevance to real-world problem...
Retrieval-Augmented Generation (RAG) empowers LLMs with external knowledge, making cross-institutional domain-specific knowledge base integration a highly promi...
Large language models (LLMs) can serve as the semantic-matching engine of a content-based publish/subscribe broker for agentic AI across the edge-cloud computin...
Distributing Transformer inference across embedded edge devices can alleviate individual memory and compute constraints, yet practical benefits on real hardware...
AI-native software development is often evaluated at the level of individual models, prompts, or generated artifacts. This framing is insufficient for productio...
Large language model (LLM) inference is limited by high computational cost and memory bandwidth demands, making deployment on heterogeneous many-core processors...
Multi-agent systems powered by large foundation models (LFMs) are increasingly deployed to control industrial robots through natural language, creating deployme...
We present the first end-to-end demonstration of fine-tuning and serving Google's Gemma 4 31B model on TPU hardware, providing an empirical comparison of TPU an...
Diffusion-based generation is increasingly powering production content pipelines; however, deploying these models at scale remains a significant challenge. Mode...
Context. Large language models (LLMs) are increasingly applied to code-generating tasks (CGTs) in software engineering. While reported results are promising, th...
Responsive Layout Failures (RLFs) typically arise from CSS properties that hinder proper layout behavior in different screen sizes. To find an accurate and effe...
The rapid evolution of large language models (LLMs) has made geographically distributed training necessary due to GPU scarcity within a single cloud region. In ...
We study the symmetric polynomial prod_{αin A_{n,d}}bigl(1+α_1 x_1+cdots+α_n x_nbigr) where A_{n,d}:={αinmathbb{Z}_{ge 0}^n:|α|=d}, which is the total Chern cla...