[Paper] Reinforcement Learning for Flow-Matching Policies with Density Transport
We present an online reinforcement learning (RL) algorithm for fine-tuning flow-matching policies in continuous-control problems. Our key insight is to view RL-...
1378 posts from this source
We present an online reinforcement learning (RL) algorithm for fine-tuning flow-matching policies in continuous-control problems. Our key insight is to view RL-...
Large Language Models (LLMs) have recently demonstrated impressive potential for time series forecasting. However, existing methods predominantly rely on passiv...
Constructing efficient and reliable policies to assist humans is indispensable for human-AI collaboration. Existing methods mainly follow two lines of work. Mos...
Kubernetes incidents are diagnosed reliably only when a root-cause system's reported gains come from incident evidence rather than scenario-specific shortcuts. ...
Purpose - Quotation error refers to the inconsistency between cited information and its original source. This phenomenon leads to a series of negative impacts, ...
Large language models (LLMs) have shown considerable promise for automated unit test generation, yet their practical effectiveness relative to human-written tes...
Speech emotion recognition (SER) is commonly formulated as utterance-level classification, although conversational emotion depends on a speaker's usual vocal ra...
Large language models frequently fail in a characteristic way: rather than acknowledging ignorance, they produce fluent but incorrect answers to questions that ...
Tensor networks provide efficient representations for compressing large neural networks. By carefully designing shapes and topologies, they can significantly re...
Transformer language models process input provided as subword fragments, but natural language semantics usually rely on word-level concepts. Detokenization is t...
Source code vulnerability detection remains a long-standing challenge due to the increasing scale, structural complexity, and semantic diversity of modern codeb...
I discuss some quantitative representations of Promise Theory for processes involving autonomous agents. Agent models are common in software systems, machine le...
I discuss some quantitative representations of Promise Theory for processes involving autonomous agents. Agent models are common in software systems, machine le...
Building Information Modeling (BIM) projects require information requirements to be described as machine-checkable Information Delivery Specification (IDS) file...
Reinforcement learning with verifiable rewards (RLVR) improves large language model reasoning but often suffers from rapid policy-entropy collapse, where the po...
Exploratory manipulation often turns an apparent failed attempt into the key evidence for what to do next. For example, a robot pulls a locked cabinet drawer, f...
AI agents increasingly take consequential actions -- shell commands, cloud operations, and arbitrary tool-calls -- so a trust layer must decide, per action, whe...
Published agent capability scores conflate what a model can do with what its scaffold lets it do, and the magnitude of this elicitation gap is not well characte...
Selective predictors answer on confident inputs and abstain elsewhere; deploying one safely needs a single finite-sample certificate that simultaneously upper-b...
Modern scientific workflows increasingly span diverse computing architectures, yet executing a single computational model across disparate systems often forces ...
As Russia's war against Ukraine extends into generative AI, large language models (LLMs) adapted for local post-Soviet languages are deployed in contested infor...
Reinforcement learning (RL) holds immense promise for enhancing the reasoning capabilities of diffusion large language models (dLLMs). However, progress is fund...
Software engineering agents (SWE agents) increasingly work through tool-mediated trajectories in real repositories, yet their behavior remains difficult to char...
As deep language models (DLMs) are increasingly deployed in high-stakes domains such as healthcare, understanding their decision rationale becomes paramount for...
Enterprise property graphs vary widely in schema structure, internal terminology, domain assumptions, governance constraints, and user interaction patterns. A d...
Context parallelism (CP) is essential for training large-scale, long-context language models, as it partitions sequences to reduce memory overhead. However, exi...
Parsing underpins a vast range of software engineering tasks, from compilers and static analyzers to language servers and fuzz testing tools. Yet most parsers d...
Large Language Models (LLMs) have become powerful tools for code generation, yet they remain prone to hallucinations-producing plausible but incorrect or fabric...
Graduate-level research reading report assessment creates a substantial labor burden for educators. While large language models (LLMs) hold great potential for ...
This paper asks when MR-subset selection is a real mutant-level requirement for minimum complete evidence in metamorphic testing rather than a coarse fault-clas...
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among clients. Unlike traditional parameter-bas...
Large Language Models (LLMs) have significantly propelled the advancement of edge intelligence and have been widely deployed across various scenarios, including...
Repository-level code generation with Large Language Models (LLMs) remains challenging, primarily due to complex dependencies and limited context windows. Recen...
Gray-box optimization is an approach for making some problem-specific information available to the algorithm while still relying on fitness information as the m...
Gray-box optimization is an approach for making some problem-specific information available to the algorithm while still relying on fitness information as the m...
Organizational and logical coupling metrics require reliable identification of unique developers. In OSS, commit metadata is limited to names and emails, and th...
Cloud Security Posture Management (CSPM) systems detect known vulnerabilities by maintaining a rule set, distributing it to customers, and evaluating it against...
LLM-agent workflows chain model calls and tool invocations, and spend most of their wall-clock time waiting on upstream operations before downstream ones can st...
Researchers have shown that neural similarity among humans predicts social closeness and cooperative success, whereas innovation often emerges from interactions...
Researchers have shown that neural similarity among humans predicts social closeness and cooperative success, whereas innovation often emerges from interactions...
Production decision systems such as ad allocation or content matching involve millions of users and thousands of items, reducing to large-scale linear programs ...
We investigate the probabilistic reasoning capabilities of large language models through a controlled benchmarking study on discrete probability problems. We co...
In this work, we focus on extending SHARP, the popular photorealistic view synthesis method, for universal monocular rendering across a continuum of camera syst...
In this work, we focus on extending SHARP, the popular photorealistic view synthesis method, for universal monocular rendering across a continuum of camera syst...
Humans learn from social life. Simulating this process with LLM-powered agents represents a promising research direction, raising a natural question: whether LL...
Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion and attention...
Current Vision-Language Models struggle with hours-long videos because processing full-length visual sequences induces prohibitive token explosion and attention...
We introduce StreamForce, a streaming video generation framework that enables physically grounded control through continuous force inputs. Unlike prior video mo...