[Paper] CommCP: Efficient Multi-Agent Coordination via LLM-Based Communication with Conformal Prediction
Source: arXiv - 2602.06038v1
Overview
The paper “CommCP: Efficient Multi‑Agent Coordination via LLM‑Based Communication with Conformal Prediction” tackles a practical problem that many robotics teams face today: how can a fleet of heterogeneous robots work together to answer questions about a physical environment and act on natural‑language commands without stepping on each other’s toes? By extending the classic Embodied Question Answering (EQA) task to a multi‑agent, multi‑task setting (MM‑EQA) and introducing a lightweight, LLM‑driven communication protocol, the authors show a clear path toward more reliable, scalable robot teams in real homes and factories.
Key Contributions
- MM‑EQA formulation – a new benchmark that combines embodied navigation, visual question answering, and cooperative manipulation across multiple heterogeneous agents.
- CommCP framework – a decentralized communication architecture that lets each robot query a large language model (LLM) for message generation while using conformal prediction to bound the uncertainty of those messages.
- Message calibration – conformal prediction produces a confidence set for each generated message, allowing receivers to filter out low‑uncertainty (potentially distracting) communications.
- Open‑source benchmark & code – a photo‑realistic household dataset with diverse tasks, plus a public repo and demo videos for reproducibility.
- Empirical gains – up to +18 % task success rate and +22 % exploration efficiency compared with prior decentralized baselines.
Methodology
-
Problem setup – Each robot receives a natural‑language assignment (e.g., “Find the red mug on the kitchen counter”). The team must explore, ask clarifying questions, and manipulate objects. The environment is simulated with high‑fidelity 3D scenes.
-
LLM‑based message generation – When an agent needs to share information (e.g., “I see a blue cup on the table”), it sends a prompt to a pre‑trained LLM (such as GPT‑4) that returns a concise textual message.
-
Conformal prediction layer – Before broadcasting, the system runs a lightweight conformal predictor on the LLM’s output distribution. This yields a prediction set with a user‑specified coverage probability (e.g., 95 %). If the set is too large (high uncertainty), the message is either pruned or re‑phrased until it meets the confidence budget.
-
Decentralized execution – No central controller; each robot runs the same pipeline locally, listening only to messages that pass the confidence filter. This reduces bandwidth and avoids “message overload” that can confuse agents.
-
Training & evaluation – Agents are trained with reinforcement learning (RL) where the reward combines task completion, communication cost, and conformal penalty. The benchmark includes 10k episodes across 30 household layouts.
Results & Findings
| Metric | Baseline (No LLM) | Decentralized LLM (no CP) | CommCP (LLM + CP) |
|---|---|---|---|
| Task success rate | 62 % | 71 % | 79 % |
| Exploration steps per episode (lower is better) | 145 | 122 | 112 |
| Avg. messages per episode | 8.3 | 12.7 | 9.1 |
| Communication‑induced error (mis‑directed actions) | 14 % | 9 % | 4 % |
- Higher success stems from more accurate, less noisy information sharing.
- Fewer steps indicate that agents can prune irrelevant regions faster thanks to calibrated messages.
- Reduced error rate shows that conformal prediction effectively filters out ambiguous or misleading LLM outputs.
Qualitative video demos illustrate agents dynamically re‑asking clarifying questions only when needed, and seamlessly handing off manipulation tasks to the robot best equipped for the object.
Practical Implications
- Scalable robot fleets – CommCP’s decentralized design means you can add more robots without redesigning a central scheduler; each node only processes high‑confidence messages.
- Bandwidth‑aware deployments – In real‑world Wi‑Fi or 5G constrained settings, the confidence filter cuts down on unnecessary chatter, saving network resources.
- Safety‑critical domains – By guaranteeing a statistical bound on message reliability, developers can embed CommCP in applications where miscommunication could cause damage (e.g., warehouse pick‑and‑place, home assistance for seniors).
- Plug‑and‑play LLM integration – The framework treats the LLM as a black‑box service, making it straightforward to swap in newer models (Claude, Gemini) as they become available.
- Rapid prototyping – The open‑source benchmark provides a ready‑made testbed for evaluating new coordination algorithms, sensor suites, or hardware platforms.
Limitations & Future Work
- Simulation‑first – Experiments are limited to photo‑realistic simulators; real‑world noise (sensor drift, network latency) may affect conformal calibration.
- LLM latency – Relying on cloud LLM APIs introduces variable response times; edge‑optimized LLMs are needed for truly real‑time coordination.
- Fixed confidence level – The current system uses a static coverage probability; adaptive confidence thresholds based on task urgency could further improve efficiency.
- Heterogeneity scope – The benchmark includes a few robot morphologies; extending to aerial drones or legged platforms will test the generality of the approach.
The authors suggest exploring online conformal learning to adapt to changing environments, and integrating multimodal LLMs (vision‑language) so that agents can exchange richer perceptual cues without exploding message size.
Authors
- Xiaopan Zhang
- Zejin Wang
- Zhixu Li
- Jianpeng Yao
- Jiachen Li
Paper Information
- arXiv ID: 2602.06038v1
- Categories: cs.RO, cs.AI, cs.CV, cs.LG, cs.MA
- Published: February 5, 2026
- PDF: Download PDF