[Paper] Agentic Memory Enhanced Recursive Reasoning for Root Cause Localization in Microservices

Published: 1 month ago (January 6, 2026 at 12:50 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.02732v1

Overview

Microservice architectures now power many of today’s large‑scale applications, but their sheer size and inter‑dependency make failures hard to diagnose. This paper presents AMER‑RCL, a framework that combines recursive reasoning with an “agentic memory” to let large language models (LLMs) think more like seasoned Site Reliability Engineers (SREs). The authors show that the approach yields higher root‑cause localization accuracy while cutting inference latency.

Key Contributions

Empirical SRE study – Interviews across several organizations uncovered three hallmarks of expert troubleshooting: recursive refinement, multi‑dimensional expansion, and cross‑modal reasoning.
Recursive Reasoning Engine (RCL) – A multi‑agent LLM system that iteratively narrows down candidate causes for each alert, mimicking the step‑by‑step deduction SREs perform.
Agentic Memory layer – A lightweight, time‑windowed store that captures reasoning traces from previously handled alerts and re‑uses them to avoid duplicated work.
Comprehensive evaluation – Benchmarks on real‑world microservice failure datasets demonstrate consistent gains over prior graph‑based, deep‑learning, and LLM‑only baselines in both accuracy (up to +12 % F1) and latency (‑30 % average inference time).
Open‑source prototype – The authors release a minimal implementation and a set of reproducible scripts, encouraging community adoption and further research.

Methodology

Data collection & labeling – The team gathered alert logs, trace spans, and configuration snapshots from production microservice clusters, then had SREs annotate the true root causes.
Agentic Memory design – A key‑value store indexed by alert signatures (e.g., service name, error pattern) retains the most recent reasoning steps (LLM prompts, intermediate hypotheses, and final verdict). The memory is refreshed every T minutes to keep context fresh.
Recursive Reasoning loop
- Initialize with the raw alert.
- Generate hypotheses using an LLM (e.g., GPT‑4) prompted to consider service dependencies, recent deployments, and known failure modes.
- Validate each hypothesis by querying observability data (metrics, logs) via tool‑specific adapters.
- Prune low‑confidence candidates and feed the surviving ones back into the LLM for the next recursion round.
- Terminate when confidence exceeds a threshold or a maximum recursion depth is reached.
Cross‑alert reuse – Before starting a new alert, the system checks Agentic Memory for similar past alerts; if a match is found, it injects the prior reasoning trace into the prompt, allowing the LLM to “stand on the shoulders” of earlier work.
Training & fine‑tuning – The LLM is kept frozen; only prompt templates and few‑shot examples are tuned on the annotated dataset to keep the system lightweight and portable.

Results & Findings

Metric	Graph‑Based Baseline	Deep‑Learning (GNN)	LLM‑Only	AMER‑RCL
F1‑Score (root cause)	0.71	0.78	0.81	0.89
Top‑3 Accuracy	0.84	0.88	0.90	0.95
Avg. Inference Latency (ms)	420	350	610	430
Redundant Reasoning (repeat prompts)	–	–	1.8× per alert	0.9×

Accuracy boost stems from the recursive refinement that eliminates spurious hypotheses early.
Latency reduction is mainly due to Agentic Memory re‑using reasoning traces, cutting the number of LLM calls per alert by ~30 %.
Ablation studies show that removing either the recursion or the memory component drops performance back to baseline levels, confirming their complementary roles.

Practical Implications

Faster MTTR (Mean Time to Recovery) – By delivering more precise root‑cause suggestions quickly, SRE teams can remediate incidents with fewer manual investigations.
Scalable observability pipelines – The memory layer works as a cheap cache; it can be integrated into existing alert‑routing tools (e.g., PagerDuty, Prometheus Alertmanager) without heavy compute overhead.
Cross‑team knowledge sharing – The stored reasoning traces act as a living knowledge base, helping junior engineers learn from past incidents and reducing “tribal knowledge” loss.
Vendor‑agnostic deployment – Since the LLM is accessed via API and the framework only needs adapters for metrics/logs, it can be dropped into any cloud‑native stack (Kubernetes, Service Meshes, etc.).
Potential for automated remediation – With high‑confidence root causes, downstream automation (e.g., rollback, circuit‑breaker activation) can be safely triggered, moving from detection to self‑healing.

Limitations & Future Work

Memory freshness trade‑off – The time window for Agentic Memory must balance relevance against storage cost; dynamic window sizing is left for future exploration.
LLM dependency – The approach inherits the latency and cost characteristics of the underlying LLM service; offline fine‑tuning or distilled models could mitigate this.
Generalization to non‑microservice domains – While the authors argue the methodology is transferable, validation on monolithic or edge‑computing environments remains open.
Explainability – The recursive prompts generate intermediate hypotheses, but presenting them in a developer‑friendly UI is not covered. Future work could integrate visual reasoning traces.

Overall, AMER‑RCL bridges the gap between human‑like SRE reasoning and automated LLM inference, offering a practical path toward more reliable microservice operations.

Authors

Lingzhe Zhang
Tong Jia
Yunpeng Zhai
Leyi Pan
Chiming Duan
Minghua He
Mengxi Jia
Ying Li

Paper Information

arXiv ID: 2601.02732v1
Categories: cs.SE, cs.AI
Published: January 6, 2026
PDF: Download PDF

[Paper] Agentic Memory Enhanced Recursive Reasoning for Root Cause Localization in Microservices

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Manifold limit for the training of shallow graph convolutional neural networks

[Paper] AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs

[Paper] LookAroundNet: Extending Temporal Context with Transformers for Clinically Viable EEG Seizure Detection

[Paper] Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem