[Paper] Reaching Agreement Among Reasoning LLM Agents

Published: 1 month ago (December 23, 2025 at 04:20 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.20184v1

Overview

The paper “Reaching Agreement Among Reasoning LLM Agents” tackles a growing pain point in today’s AI‑powered multi‑agent systems: how to coordinate many large language model (LLM) “agents” so they can reason together efficiently without wasting compute or delivering inconsistent answers. By framing the problem as a distributed consensus task—much like the algorithms that keep databases and blockchains in sync—the authors introduce a provably correct protocol that dramatically cuts latency while preserving answer quality.

Key Contributions

Formal model of multi‑agent refinement – defines correctness guarantees (safety, liveness) for stochastic reasoning agents.
Aegean consensus protocol – a lightweight, quorum‑based algorithm that lets agents stop early once enough of them agree, avoiding “straggler” delays.
Aegean‑Serve serving engine – an implementation that detects incremental quorums across concurrent LLM executions and triggers early termination.
Empirical validation – experiments on four mathematical reasoning benchmarks show 1.2–20× latency reductions with ≤2.5% drop in answer quality, both on local GPUs and commercial API back‑ends.
Provable safety & liveness – the system guarantees that the final answer is either a correct consensus or the protocol will continue until it can be.

Methodology

Problem Formalization – The authors model each reasoning LLM as a stochastic node that produces a candidate answer after a variable amount of compute. The goal is to reach a refinement—a shared answer that satisfies a predefined correctness predicate.
Consensus Design – Building on classic distributed consensus (e.g., Paxos, Raft), Aegean introduces a probabilistic quorum: instead of waiting for all agents, it tracks how many have produced the same answer and stops once a configurable confidence threshold is met.
Incremental Quorum Detection – Aegean‑Serve monitors the stream of partial results in real time. As soon as the quorum condition is satisfied, it aborts the remaining slower agents, returning the agreed‑upon answer.
Safety Checks – Before finalizing, the system re‑evaluates the consensus answer against a lightweight verifier (e.g., a smaller LLM or a rule‑based checker) to ensure it meets the correctness predicate.
Evaluation – The protocol is benchmarked on four math‑reasoning tasks (e.g., GSM8K, MATH) using both self‑hosted GPU clusters and external APIs (OpenAI, Anthropic). Latency, compute cost, and answer accuracy are measured against baseline orchestration strategies (fixed‑loop, barrier sync).

Results & Findings

Setting	Baseline Latency (s)	Aegean Latency (s)	Speed‑up	Answer Quality Δ
Local GPU (8 agents)	4.8	0.4 – 4.0	1.2× – 20×	≤ 2.5%
OpenAI API (4 agents)	6.2	0.5 – 5.1	1.2× – 12×	≤ 2.5%
Anthropic API (6 agents)	7.5	0.6 – 6.3	1.2× – 13×	≤ 2.5%

Latency drops dramatically because the protocol stops waiting for the slowest “straggler” agents.
Compute cost is reduced proportionally, since aborted agents free up GPU/API quota.
Answer quality remains virtually unchanged; the small verification step catches the rare cases where early termination would have produced a wrong answer.
The protocol works consistently across different hardware and API providers, demonstrating its platform‑agnostic nature.

Practical Implications

Faster AI‑augmented workflows – Teams building chat‑bots, code‑assistants, or decision‑support tools can now orchestrate multiple LLM calls without incurring the typical “wait‑for‑all” penalty.
Cost savings – By terminating unnecessary agent runs, cloud‑based API usage drops, which can translate to tens of dollars saved per thousand queries in high‑throughput services.
Scalable ensemble reasoning – Developers can safely increase the number of reasoning agents (e.g., diverse prompts, temperature settings) to boost robustness, knowing the system will automatically prune excess compute.
Reliability guarantees – The formal safety/liveness proofs give product owners confidence that the system won’t return inconsistent or partially validated answers, a critical requirement for regulated domains (finance, healthcare).
Plug‑and‑play serving layer – Aegean‑Serve can be wrapped around existing LLM inference pipelines (e.g., LangChain, LlamaIndex) with minimal code changes, making adoption straightforward.

Limitations & Future Work

Verification overhead – The lightweight correctness check adds a small constant cost; in ultra‑low‑latency scenarios (sub‑100 ms) this could become noticeable.
Assumption of independent stochastic agents – The model presumes agents operate independently; tightly coupled agents (e.g., shared memory) may need a different consensus strategy.
Domain‑specific predicates – The current experiments focus on mathematical reasoning; extending the protocol to open‑ended generation (creative writing, code synthesis) will require richer, possibly learned, correctness predicates.
Dynamic quorum tuning – Future work could explore adaptive quorum thresholds that react to observed agent variance, further optimizing the trade‑off between speed and answer fidelity.

Bottom line: By borrowing rigor from distributed systems and applying it to LLM ensembles, the authors deliver a practical, provably correct orchestration layer that slashes latency and cost while keeping answers reliable—a win for any developer looking to scale reasoning‑heavy AI services.

Authors

Chaoyi Ruan
Yiliang Wang
Ziji Shi
Jialin Li

Paper Information

arXiv ID: 2512.20184v1
Categories: cs.DC
Published: December 23, 2025
PDF: Download PDF

[Paper] Reaching Agreement Among Reasoning LLM Agents

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Proceedings First Workshop on Adaptable Cloud Architectures

[Paper] FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion

[Paper] Robust Federated Fine-Tuning in Heterogeneous Networks with Unreliable Connections: An Aggregation View

[Paper] BLEST: Blazingly Efficient BFS using Tensor Cores