[Paper] Controllable Reasoning Models Are Private Thinkers
AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result in the u...
5644 posts from this source
AI agents powered by reasoning models require access to sensitive user data. However, their reasoning traces are difficult to control, which can result in the u...
Diffusion models achieve state-of-the-art video generation quality, but their inference remains expensive due to the large number of sequential denoising steps....
Despite their capabilities, Multimodal Large Language Models (MLLMs) may produce plausible but erroneous outputs, hindering reliable deployment. Accurate uncert...
We present a scalable methodology for evaluating language models in multi-turn interactions, using a suite of collaborative games that require effective communi...
Small language models (SLMs) have emerged as efficient alternatives to large language models for task-specific applications. However, they are often employed in...
Argumentative LLMs (ArgLLMs) are an existing approach leveraging Large Language Models (LLMs) and computational argumentation for decision-making, with the aim ...
Scaling the number of qubits available across multiple quantum devices is an active area of research within distributed quantum computing (DQC). This includes q...
Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, action decis...
The expansion of retrieval-augmented generation (RAG) into multimodal domains has intensified the challenge for processing complex visual documents, such as fin...
Functional testing is essential for verifying that the business logic of mobile applications aligns with user requirements, serving as the primary methodology f...
This paper proposes CIRCLE, a six-stage, lifecycle-based framework to bridge the reality gap between model-centric performance metrics and AI's materialized out...
Large Language Model (LLM) adapters enable low-cost model specialization, but introduce complex caching and scheduling challenges in distributed serving systems...
Property checking of RTL designs is a central task in formal verification. Among available engines, IC3/PDR is a widely used backbone whose performance critical...
Background. Automated test execution is an important activity to gather information about the quality of a software project. So-called flaky tests, however, neg...
Serverless computing simplifies cloud deployment but introduces new challenges in managing service latency and carbon emissions. Reducing cold-start latency req...
We present a multiparty session type (MST) framework with asynchronous mixed choice (MC). We propose a core construct for MC that allows transient inconsistenci...
Microservice architectures are an emergent technology that builds business logic into a suite of small services. Each microservice runs in its process and the c...
AI coding agents allow software developers to generate code quickly, which raises a practical question for project managers and open source maintainers: can vib...
Software engineering agents (SWE) are improving rapidly, with recent gains largely driven by reinforcement learning (RL). However, RL training is constrained by...
Machine unlearning for large language models often faces a privacy dilemma in which strict constraints prohibit sharing either the server's parameters or the cl...
Modern cloud servers routinely co-locate multiple latency-sensitive microservice instances to improve resource efficiency. However, the diversity of microservic...
PoCo is a technique that aims to enhance modern coverage-based seed selection (CSS) techniques (such as afl-cmin) by gradually removing obstacle conditional sta...
LLM-powered Multi-Agent Systems (MAS) have demonstrated remarkable capabilities in complex domains but suffer from inherent fragility and opaque failure mechani...
With the increasing importance of distributed scientific workflows, there is a critical need to ensure Quality of Service (QoS) constraints, such as minimizing ...
For every real number c geq 1 and for all varepsilon > 0, there is a fitness function f : {0,1}^n to mathbb{R} for which the optimal mutation rate for the (1...
Large-scale Graph Neural Networks (GNNs) are typically trained by sampling a vertex's neighbors to a fixed distance. Because large input graphs are distributed,...
Federated Learning (FL) enables a group of clients to collaboratively train a model without sharing individual data, but its performance drops when client data ...
We introduce MediX-R1, an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded...
We present a scalable 3D reconstruction model that addresses a critical limitation in offline feed-forward methods: their computational and memory requirements ...
Numerous lines of aim to control model disagreement -- the extent to which two machine learning models disagree in their predictions. We adopt a simple and stan...
We identify occlusion reasoning as a fundamental yet overlooked aspect for 3D layout-conditioned generation. It is essential for synthesizing partially occluded...
A dataset server must often distribute the same large payload to many clients, incurring massive communication costs. Since clients frequently operate on divers...
Bio-inspired event cameras have recently attracted significant research due to their asynchronous and low-latency capabilities. These features provide a high dy...
The Platonic Representation Hypothesis posits that neural networks trained on different modalities converge toward a shared statistical model of the world. Rece...
Recent work such as AlphaEvolve has shown that combining LLM-driven optimization with evolutionary search can effectively improve programs, prompts, and algorit...
The lack of reasoning capabilities in Vision-Language Models (VLMs) has remained at the forefront of research discourse. We posit that this behavior stems from ...
Standard mixed-precision training of neural networks requires many bytes of accelerator memory for each model parameter. These bytes reflect not just the parame...
Coarse data arise when learners observe only partial information about samples; namely, a set containing the sample rather than its exact value. This occurs nat...
Open-vocabulary segmentation (OVS) extends the zero-shot recognition capabilities of vision-language models (VLMs) to pixel-level prediction, enabling segmentat...
Recent advances in machine learning have emphasized the integration of structured optimization components into end-to-end differentiable models, enabling richer...
AI-powered scientific research tools are rapidly being integrated into research workflows, yet the field lacks a clear lens into how researchers use these syste...
Neural network accelerators have been widely applied to edge devices for complex tasks like object tracking, image recognition, etc. Previous works have explore...
A growing number of publications address the best practices to use Large Language Models (LLMs) for software engineering in recent years. However, most of this ...
The advancement of large language models (LLMs) has accelerated the development of autonomous financial trading systems. While mainstream approaches deploy mult...
Large language models (LLMs) perform increasingly well on biology benchmarks, but it remains unclear whether they uplift novice users -- i.e., enable humans to ...
Using advanced machine learning techniques, we developed a method for reconstructing precisely the arrival direction and energy of ultra-high-energy cosmic rays...
Self-reflection enables language agents to iteratively refine solutions, yet often produces repetitive outputs that limit reasoning performance. Recent studies ...
Generalized Rapid Action Value Estimation (GRAVE) has been shown to be a strong variant within the Monte-Carlo Tree Search (MCTS) family of algorithms for Gener...