[Paper] Recursive Multi-Agent Systems
Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen ...
5533 posts from this source
Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen ...
Real-world data visualization (DV) requires native environmental grounding, cross-platform evolution, and proactive intent alignment. Yet, existing benchmarks o...
Adapting reasoning models to new tasks during post-training with only output-level supervision stalls under reinforcement learning from verifiable rewards (RLVR...
How much does a user's skill with AI shape what AI actually delivers for them? This question is critical for users, AI product builders, and society at large, b...
Identity teacher forcing (ITF) enables stable training of deterministic recurrent surrogates for chaotic dynamical systems and has been highly effective for dyn...
The accelerating adoption of Large Language Models (LLMs) in software engineering (SE) has brought with it a silent crisis: unsustainable computational cost. Wh...
Distributional and neural approaches to natural language semantics have been built almost exclusively on conventional linear algebra: vectors, matrices, tensors...
As LLM applications grow more complex, developers are increasingly adopting multi-agent architectures to decompose workflows into specialized, collaborative com...
Continual offline reinforcement learning (CORL) aims to learn a sequence of tasks from datasets collected over time while preserving performance on previously l...
Contact variability, sensing uncertainty, and external disturbances make grasp execution stochastic. Expected-quality objectives ignore tail outcomes and often ...
Preference-based alignment methods, most prominently Reinforcement Learning with Human Feedback (RLHF), use the judgments of human annotators to shape large lan...
Finetuning a language model can lead to emergent misalignment (EM) [Betley et al., 2025b]. Models trained on a narrow distribution of misaligned behavior genera...
Current deepfake detection models achieve state-of-the-art performance on pristine academic datasets but suffer severe spatial attention drift under real-world ...
Current pedestrian crossing signals operate on fixed timing without adjustment to pedestrian behavior, which can leave vulnerable road users (VRUs) such as the ...
Graph neural networks such as ParticleNet and transformer based networks on point clouds such as ParticleTransformer achieve state-of-the-art performance on jet...
Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this...
Resolution of complex post-production issues in large-scale open-source software (OSS) projects requires significant cognitive effort, as developers need to go ...
Training language models via reinforcement learning often relies on imperfect proxy rewards, since ground truth rewards that precisely define the intended behav...
Large language models (LLMs) are increasingly used in emotionally sensitive human-AI applications, yet little is known about how emotion recognition is internal...
Existing REST API testing tools are typically evaluated using code coverage and crash-based fault metrics. However, recent LLM-based approaches increasingly gen...
Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific finge...
Multimodal large language models (MLLMs) achieve ever-stronger performance on visual-language tasks. Even as traditional visual question answering benchmarks ap...
Traditional loss functions, including cross-entropy, contrastive, triplet, and su pervised contrastive losses, used for fine-tuning pre-trained language models ...
Harnesses have become a central determinant of coding-agent performance, shaping how models interact with repositories, tools, and execution environments. Yet a...
Patient simulators are gaining traction in mental health training by providing scalable exposure to complex and sensitive patient interactions. Simulating depre...
Social identity is a concept from psychology that refers to the part of an individual's identity that derives from their group membership(s). In this paper, we ...
In this work, we propose Mutual Forcing, a framework for fast autoregressive audio-video generation with long-horizon audio-video synchronization. Our approach ...
Magnification shift is a major obstacle to robust histopathology classification, because models trained on one imaging scale often generalize poorly to another....
Vision-Language Models (VLMs) exhibit strong performance in instruction following and open-ended vision-language reasoning, yet they frequently generate fluent ...
Microservice-based systems impose significant organizational coordination challenges, yet the role of individual developers in shaping organizational coupling (...
Knowledge distillation (KD) is a well-known technique to effectively compress a large network (teacher) to a smaller network (student) with little sacrifice in ...
Knowledge distillation (KD) represents a vital mechanism to transfer expertise from complex teacher networks to efficient student models. However, in decentrali...
Articulation modeling aims to infer movable parts and their motion parameters for a 3D object, enabling interactive animation, simulation, and shape editing. In...
Source Code Plagiarism Detection (SCPD) plays an important role in maintaining fairness and academic integrity in software engineering education. Code Evaluatio...
Federated inference enhances LLM performance in edge computing through weighted averaging of distributed model predictions. However, autoregressive LLM inferenc...
We present the SCenario Specification Language (SCSL) for automated generation and execution of system-level tests. SCSL targets complex distributed systems (e....
Instructed code editing is a significant challenge for large language models (LLMs). On the EditBench benchmark, 39 of 40 evaluated models obtain a task success...
Source code and its accompanying comments are complementary yet naturally aligned modalities-code encodes structural logic while comments capture developer inte...
Parallel scan primitives compute element-wise inclusive or exclusive prefix sums of input vectors contributed by p consecutively ranked processors under an asso...
Formal models for concurrent and distributed systems describe machines; the people who operate them are either ignored or treated as external environment. Yet k...
The rising share of abundant renewable energy inevitably increases volatility in the electricity production. The concept of sector coupling means that the volat...
Time series classification is an important analytical task across diverse domains. However, its practical application is often hindered by the scarcity of label...
Stopping criteria automatically determine when to stop an evolutionary algorithm, so as not to waste function evaluations on a stagnant population. Although sto...
Efficient GPU execution of convolution operators is governed by memory-access efficiency, on-chip data reuse, and execution mapping rather than arithmetic throu...
Microservice-based cloud applications face changing workloads, evolving request paths, variable network conditions, interference, and failures. These dynamics c...
KV cache restoration has emerged as a dominant bottleneck in serving long-context LLM workloads, including multi-turn conversations, retrieval-augmented generat...
Custom policy-learning pipelines in Spark fail for two coupled systems reasons: rowwise Python execution makes inference impractical, and driver-side candidate ...
The Internet of Everything (IoE) represents an evolution of the Internet of Things (IoT) by integrating people, data, processes, and things into a unified intel...