[Paper] Recursive Multi-Agent Systems
Source: arXiv - 2604.25917v1
Overview
The paper introduces RecursiveMAS, a novel framework that treats a whole multi‑agent system as a single recursive computation over latent representations. By looping agents together with a lightweight “RecursiveLink” module, the authors show that collaboration itself can be scaled—much like recent “looped” language models that repeatedly refine their own reasoning. The result is a system that reasons faster, uses fewer tokens, and achieves higher accuracy across a wide range of tasks.
Key Contributions
- Recursive multi‑agent formulation – casts heterogeneous agents and their interactions as a unified latent‑space recursion, enabling seamless state transfer between agents.
- RecursiveLink module – a lightweight connector that generates in‑distribution latent “thoughts” and passes them across agents without costly text generation.
- Inner‑outer loop learning algorithm – jointly optimizes all agents and the recursion dynamics via shared gradient‑based credit assignment, eliminating the need for separate fine‑tuning stages.
- Theoretical analysis – proves lower runtime complexity than conventional text‑based multi‑agent pipelines and demonstrates stable gradient flow across recursion rounds.
- Extensive empirical validation – evaluates four common collaboration patterns on nine benchmarks (math, science, medicine, search, code) and reports an average 8.3 % accuracy gain, 1.2‑2.4× inference speedup, and 34.6‑75.6 % token‑usage reduction versus strong baselines.
Methodology
- Latent‑space recursion – Instead of having agents exchange raw text, each agent produces a compact latent vector (its “thought”). These vectors are fed back into the system through the RecursiveLink, forming a closed loop that can be iterated multiple times.
- RecursiveLink design – A small neural module that aligns the output space of one agent with the input space of the next, ensuring the generated latent states stay within the distribution the agents were trained on.
- Learning algorithm
- Inner loop: runs a fixed number of recursion steps, propagating latent states and collecting task‑specific losses.
- Outer loop: back‑propagates through the entire recursion graph, assigning credit to every agent and the RecursiveLink simultaneously.
- Gradient sharing across recursion steps keeps updates stable and avoids exploding/vanishing gradients.
- Collaboration patterns – The authors instantiate RecursiveMAS for four archetypal setups (e.g., planner‑executor, specialist‑generalist, debate, and hierarchical supervision) to demonstrate flexibility.
Results & Findings
| Benchmark Category | Avg. Accuracy Gain | Inference Speedup | Token Reduction |
|---|---|---|---|
| Mathematics | +9.1 % | 2.1× | 68 % |
| Science & Medicine | +7.8 % | 1.8× | 55 % |
| Search & Retrieval | +8.5 % | 1.5× | 42 % |
| Code Generation | +8.2 % | 1.2× | 35 % |
- RecursiveMAS consistently outperformed both single‑agent recursive models and standard multi‑agent pipelines that rely on full text exchanges.
- The recursive loop converges in 3‑5 iterations on average, showing that deepening reasoning does not require many passes.
- Gradient analysis confirms stable norms across recursion depth, validating the inner‑outer loop training scheme.
Practical Implications
- Reduced API costs – By operating in latent space, RecursiveMAS cuts token usage dramatically, which translates directly into lower usage fees for services like OpenAI or Anthropic.
- Faster collaborative assistants – Real‑time tools (e.g., AI pair programmers, research assistants, or medical triage bots) can now coordinate multiple specialized models without the latency of serial text passing.
- Modular system design – Teams can plug in existing pretrained agents (e.g., a code LLM, a retrieval model, a reasoning LLM) and let RecursiveMAS handle the orchestration, simplifying engineering pipelines.
- Scalable reasoning – The recursion mechanism offers a new scaling knob: instead of training ever larger monolithic models, developers can deepen reasoning by adding recursion steps, saving compute and memory.
Limitations & Future Work
- Recursion depth trade‑off – While 3‑5 steps work well for the evaluated tasks, more complex problems may need deeper loops, potentially re‑introducing gradient instability.
- Heterogeneity handling – The current RecursiveLink assumes compatible latent dimensions; extending it to truly disparate model architectures (e.g., vision‑only agents) requires additional alignment mechanisms.
- Benchmark scope – Experiments focus on well‑structured benchmarks; real‑world noisy environments (e.g., open‑domain dialogue) remain to be tested.
- Future directions suggested by the authors include: adaptive recursion schedules (deciding on‑the‑fly how many loops are needed), richer cross‑modal latent bridges, and applying RecursiveMAS to large‑scale distributed systems (e.g., edge‑cloud collaborations).
Authors
- Xiyuan Yang
- Jiaru Zou
- Rui Pan
- Ruizhong Qiu
- Pan Lu
- Shizhe Diao
- Jindong Jiang
- Hanghang Tong
- Tong Zhang
- Markus J. Buehler
- Jingrui He
- James Zou
Paper Information
- arXiv ID: 2604.25917v1
- Categories: cs.AI, cs.CL, cs.LG
- Published: April 28, 2026
- PDF: Download PDF