[Paper] Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
Unified multimodal models can both understand and generate visual content within a single architecture. Existing models, however, remain data-hungry and too hea...
5750 posts from this source
Unified multimodal models can both understand and generate visual content within a single architecture. Existing models, however, remain data-hungry and too hea...
We propose tttLRM, a novel large 3D reconstruction model that leverages a Test-Time Training (TTT) layer to enable long-context, autoregressive 3D reconstructio...
Rapid progress in video models has largely focused on visual quality, leaving their reasoning capabilities underexplored. Video reasoning grounds intelligence i...
Current feed-forward 3D/4D reconstruction systems rely on dense geometry and pose supervision -- expensive to obtain at scale and particularly scarce for dynami...
LLM agents are evolving rapidly, powered by code execution, tools, and the recently introduced agent skills feature. Skills allow users to extend LLM applicatio...
We study post-calibration uncertainty for trained ensembles of classifiers. Specifically, we consider both aleatoric (label noise) and epistemic (model) uncerta...
Inspired by behavioral science, we propose Behavior Learning (BL), a novel general-purpose machine learning framework that learns interpretable and identifiable...
Conformal risk control is an extension of conformal prediction for controlling risk functions beyond miscoverage. The original algorithm controls the expected v...
Estimating simulation-ready scenes from real-world observations is crucial for downstream planning and policy learning tasks. Regretfully, existing methods stru...
We present AgentOptics, an agentic AI framework for high-fidelity, autonomous optical system control built on the Model Context Protocol (MCP). AgentOptics inte...
Mean Field Games (MFGs) provide a principled framework for modeling interactions in large population models: at scale, population dynamics become deterministic,...
Data visualization rules-derived from decades of research in design and perception-ensure trustworthy chart communication. While prior work has shown that large...
With the rise of large language models (LLMs), they have become instrumental in applications such as Retrieval-Augmented Generation (RAG). Yet evaluating these ...
Epidemiological models increasingly rely on self-reported behavioral data such as vaccination status, mask usage, and social distancing adherence to forecast di...
The paradigm of automated program generation is shifting from one-shot generation to inference-time search, where Large Language Models (LLMs) function as seman...
Current reinforcement learning objectives for large-model reasoning primarily focus on maximizing expected rewards. This paradigm can lead to overfitting to dom...
Objective: To improve the efficiency of medical question answering (MedQA) with large language models (LLMs) by avoiding unnecessary reasoning while maintaining...
Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive (AR) approaches, enabling parallel token generation beyond a...
How do large language models (LLMs) know what they know? Answering this question has been difficult because pre-training data is often a 'black box' -- unknown ...
Solving long-horizon tasks requires robots to integrate high-level semantic reasoning with low-level physical interaction. While vision-language models (VLMs) a...
Reinforcement learning with verifiable rewards (RLVR) has emerged as a promising approach for training reasoning language models (RLMs) by leveraging supervisio...
Research in machine unlearning (MU) has gained strong momentum: MU is now widely regarded as a critical capability for building safe and fair AI. In parallel, r...
We study online learning in the adversarial injection model introduced by [Goel et al. 2017], where a stream of labeled examples is predominantly drawn i.i.d. f...
The dependence on expert annotation has long constituted the primary rate-limiting step in the application of artificial intelligence to biomedicine. While supe...
Error-bounded lossy compression has been regarded as a promising way to address the ever-increasing amount of scientific data in today's high-performance comput...
BabyLM aims to dissolve the boundaries between cognitive modeling and language modeling. We call for both workshop papers and for researchers to join the 4th Ba...
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by conditioning generation on retrieved external documents, but the effect of retriev...
Edge-based representations are fundamental cues for visual understanding, a principle rooted in early vision research and still central today. We extend this pr...
Large Language Models (LLMs) play a critical role in how humans access information. While their core use relies on comprehending written requests, our understan...
In this study, the output of large language models (LLM) is considered an information source generating an unlimited sequence of symbols drawn from a finite alp...
Modern code intelligence agents operate in contexts exceeding 1 million tokens--far beyond the scale where humans manually locate relevant files. Yet agents con...
Large language models are being deployed in complex socio-technical systems, which exposes limits in current alignment practice. We take the position that the d...
Large language models (LLMs) offer substantial promise for automating clinical text summarization, yet maintaining factual consistency remains challenging due t...
Industrial IoT ecosystems bring together sensors, machines and smart devices operating collaboratively across industrial environments. These systems generate la...
LLM-enabled applications are rapidly reshaping the software ecosystem by using large language models as core reasoning components for complex task execution. Th...
As LLM-based Multi-Agent Systems (MAS) are increasingly deployed for complex tasks, ensuring their reliability has become a pressing challenge. Since MAS coordi...
We introduce a diagonalization-based optimization for Linear Echo State Networks (ESNs) that reduces the per-step computational complexity of reservoir state up...
As Operational Technology increasingly integrates with Information Technology, the need for Intrusion Detection Systems becomes more important. This paper explo...
Wildfire monitoring demands timely data collection and processing for early detection and rapid response. UAV-assisted edge computing is a promising approach, b...
The rapid adoption of Generative AI (GenAI) in the software development life cycle (SDLC) increases computational demand, which can raise the carbon footprint o...
Git is widely used for collaborative software development, but it can be challenging for newcomers. While most learning tools focus on individual workflows, Git...
Gaussian processes (GPs) are a widely used regression tool, but the cubic complexity of exact solvers limits their scalability. To address this challenge, we ex...
Watching training videos passively leads to superficial learning. Adding gamification can increase engagement. We study how software engineering students and in...
The adoption of large language models in safety-critical system engineering is constrained by trustworthiness, traceability, and alignment with established veri...
Traditional database fuzzing techniques primarily focus on syntactic correctness and general SQL structures, leaving critical yet obscure DBMS features, such as...
The open-source software (OSS) community has historically been dominated by English as the primary language for code, documentation, and developer interactions....
Autonomous coding agents increasingly contribute to software development by submitting pull requests on GitHub; yet, little is known about how these contributio...
iCloud Drive presents a filesystem interface but implements cloud synchronization semantics that diverge from POSIX in fundamental ways. This divergence is not ...