[Paper] BAMAS: Structuring Budget-Aware Multi-Agent Systems

Published: 2 months ago (November 26, 2025 at 11:48 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2511.21572v1

Overview

Large‑language‑model (LLM) powered multi‑agent systems are proving capable of tackling intricate, multi‑step problems, but their operational costs can quickly become prohibitive. The paper “BAMAS: Structuring Budget‑Aware Multi‑Agent Systems” introduces a systematic way to design such systems while staying inside a predefined budget, striking a balance between performance and expense.

Key Contributions

Budget‑driven agent selection: Formulates the choice of LLMs as an Integer Linear Programming (ILP) problem that jointly optimizes task performance and monetary cost.
Topology‑aware collaboration: Uses reinforcement learning (RL) to discover an interaction graph (who talks to whom) that maximizes efficiency under the chosen budget.
End‑to‑end pipeline: Provides a practical workflow—select → structure → instantiate—that can be applied to any LLM‑based multi‑agent application.
Empirical validation: Demonstrates up to 86 % cost reduction on three benchmark tasks while keeping accuracy on par with state‑of‑the‑art (SOTA) baselines.

Methodology

Define the budget and candidate LLM pool – Each candidate model (e.g., GPT‑3.5, Claude‑1, LLaMA‑2) is annotated with its per‑token price and an estimated performance score for the target task.
ILP‑based selection – The system solves an integer linear program that picks a subset of models whose total cost ≤ budget while maximizing a weighted sum of their performance scores.
RL‑driven topology search – With the selected agents fixed, a reinforcement‑learning agent proposes edges in a directed graph (e.g., “Agent A sends its output to Agent B”). The reward combines task success (e.g., accuracy, completion rate) and the marginal cost of extra communication.
Instantiation & execution – The final graph is materialized: each node runs its assigned LLM, exchanges messages according to the learned topology, and produces the overall solution.

The approach is deliberately modular: you can swap the ILP solver, replace the RL algorithm, or plug in a different cost model without redesigning the whole pipeline.

Results & Findings

Task (benchmark)	Baseline (SOTA) Cost	BAMAS Cost	Cost Reduction	Performance Δ
Complex reasoning (Chain‑of‑Thought)	$1.20 per query	$0.17 per query	86 %	±0.2 %
Multi‑turn planning	$0.95 per query	$0.28 per query	71 %	+0.1 %
Knowledge‑intensive QA	$0.78 per query	$0.32 per query	59 %	–0.3 %

Key takeaways

Cost savings are achieved without sacrificing accuracy – the performance gap is within statistical noise for all three tasks.
Hybrid agent mixes outperform single‑model baselines – e.g., pairing a cheap, fast model for preprocessing with a premium model for final verification yields the best trade‑off.
Learned topologies are often sparse, confirming that many interactions are unnecessary and can be pruned to save API calls.

Practical Implications

Product teams can set a hard budget (e.g., $0.05 per user request) and let BAMAS automatically configure the cheapest viable agent ensemble, removing the need for manual trial‑and‑error.
Server‑less deployments become feasible: by minimizing token usage, developers can run LLM‑driven assistants on low‑cost cloud functions or even on‑device inference for edge cases.
Dynamic scaling – BAMAS can be re‑run when pricing changes (e.g., new model releases) to instantly re‑optimize the agent pool, ensuring continuous cost‑effectiveness.
Explainability for cost – The ILP formulation provides a clear audit trail of why a particular model was chosen, useful for compliance and budgeting reports.

Limitations & Future Work

Static budget assumption: The current pipeline optimizes for a single, fixed budget per deployment; handling fluctuating budgets (e.g., burst traffic) requires extensions.
Performance estimation reliance: The ILP needs accurate prior performance scores for each candidate LLM, which may be noisy for novel tasks.
Scalability of RL topology search: While effective for up to ~10 agents, the search space grows combinatorially; future work could explore graph‑neural‑network‑based topology predictors.
Broader evaluation: The authors test three tasks; applying BAMAS to domains like autonomous robotics or real‑time gaming would further validate its generality.

Authors

Liming Yang
Junyu Luo
Xuanzhe Liu
Yiling Lou
Zhenpeng Chen

Paper Information

arXiv ID: 2511.21572v1
Categories: cs.MA, cs.AI
Published: November 26, 2025
PDF: Download PDF

[Paper] BAMAS: Structuring Budget-Aware Multi-Agent Systems

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Escaping the Verifier: Learning to Reason via Demonstrations

[Paper] Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative PPO

[Paper] EnergyTwin: A Multi-Agent System for Simulating and Coordinating Energy Microgrids

AI agents find $4.6M in blockchain smart contract exploits