[Paper] Graph-Based Self-Healing Tool Routing for Cost-Efficient LLM Agents

Published: (March 2, 2026 at 02:21 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2603.01548v1

Overview

Neeraj Bholani’s paper introduces Self‑Healing Router, a new orchestration layer for large‑language‑model (LLM) agents that use external tools. By treating most control‑flow choices as graph‑based routing instead of LLM reasoning, the system dramatically cuts inference cost while automatically recovering from tool outages—something static workflow graphs struggle with.

Key Contributions

  • Fault‑tolerant routing architecture that separates “routing” (graph traversal) from “reasoning” (LLM calls).
  • Parallel health monitors that continuously score tool availability, latency, and risk, feeding these scores into edge weights.
  • Cost‑weighted tool graph where Dijkstra’s algorithm finds the cheapest viable execution path in real time.
  • Deterministic self‑healing: when a tool fails, its edges are re‑weighted to infinity and the shortest path is recomputed without involving the LLM.
  • Binary observability: every failure is either logged as a reroute or escalated to the LLM—no silent skips.
  • Empirical validation across 19 heterogeneous scenarios, showing a 93 % reduction in control‑plane LLM calls (9 vs. 123) while matching the correctness of the ReAct baseline.

Methodology

  1. Tool Graph Construction – Developers encode the agent’s workflow as a directed graph (linear pipelines, DAGs, or fan‑out structures). Each edge represents a possible transition between tools and carries a base cost (e.g., latency, monetary price).
  2. Health Monitoring Layer – Independent watchdog processes poll each tool’s health metrics (HTTP status, response time, error rates, domain‑specific risk signals). They output a priority score that is added to the edge’s base cost.
  3. Dynamic Edge Re‑weighting – If a monitor detects a failure, the corresponding edge weight is set to ∞, effectively removing it from the graph.
  4. Shortest‑Path Routing – Whenever the agent needs to decide the next tool, the router runs Dijkstra’s algorithm on the current weighted graph. The resulting path is deterministic and cost‑optimal given the observed health state.
  5. LLM Escalation Policy – If the graph has no feasible path (all routes blocked), the router falls back to the LLM, which can either demote the goal, propose an alternative strategy, or raise an exception.

The approach is deliberately lightweight: the routing step is a few milliseconds of graph traversal, while LLM inference is reserved for truly ambiguous or novel situations.

Results & Findings

MetricSelf‑Healing RouterReAct (LLM‑only)Static Workflow Baseline
Correctness (goal completion)≈ 98 % (matches ReAct)≈ 98 %85 % (drops under compound failures)
Control‑plane LLM calls (aggregate)91230 (but silent failures)
Silent‑failure incidents007 (undetected skips)
Average routing latency4 ms120 ms (LLM inference)3 ms

Key takeaways: the router preserves the high success rate of LLM‑driven agents while slashing the number of expensive LLM invocations. Moreover, the deterministic routing eliminates the “silent‑failure” mode that plagued static pipelines when multiple tools failed simultaneously.

Practical Implications

  • Cost Savings – For production LLM agents (e.g., code assistants, data‑pipeline orchestrators), the 93 % reduction in LLM calls translates directly into lower API bills and reduced GPU usage.
  • Improved Latency – Routing decisions happen in milliseconds, enabling near‑real‑time responsiveness for interactive applications such as chat‑bots or IDE plugins.
  • Robust Deployments – Enterprises can rely on a single, maintainable graph definition while still gaining automatic recovery from service outages, network glitches, or rate‑limit throttling.
  • Simplified Observability – Binary outcomes (reroute vs. escalation) make logging and alerting straightforward, aiding SRE teams in diagnosing failures without combing through ambiguous LLM outputs.
  • Modular Architecture – The health‑monitoring layer can be extended with custom risk models (e.g., compliance checks, cost caps), allowing teams to enforce policy without rewriting the LLM prompt logic.

Overall, developers can build more cost‑efficient and resilient LLM agents by adopting the Self‑Healing Router pattern instead of relying on heavyweight “think‑every‑step” designs.

Limitations & Future Work

  • Graph Design Overhead – The system assumes a well‑structured tool graph; creating and maintaining this graph for highly dynamic domains may require additional engineering effort.
  • Scalability of Monitors – Parallel health monitors add runtime overhead; scaling to thousands of tools could strain monitoring infrastructure.
  • LLM Fallback Quality – When no path exists, the LLM must handle a broader set of edge cases, potentially reducing correctness if the fallback prompt is not carefully crafted.
  • Future Directions – The authors suggest exploring adaptive graph augmentation (auto‑adding edges based on observed successful reroutes), integrating probabilistic edge weights for stochastic tool performance, and extending the architecture to multi‑agent coordination scenarios.

Authors

  • Neeraj Bholani

Paper Information

  • arXiv ID: 2603.01548v1
  • Categories: cs.AI, cs.SE
  • Published: March 2, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »