[Paper] FC-ADL: Efficient Microservice Anomaly Detection and Localisation Through Functional Connectivity

Published: 5 days ago (November 30, 2025 at 06:29 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2512.00844v1

Overview

Microservice‑based systems are everywhere—from cloud‑native apps to massive e‑commerce platforms—but their distributed nature makes it hard to spot and pinpoint failures quickly. The paper FC‑ADL: Efficient Microservice Anomaly Detection and Localisation Through Functional Connectivity introduces a novel, low‑overhead technique that treats the relationships between service metrics like brain activity patterns, enabling fast detection and root‑cause suggestion even in huge deployments.

Key Contributions

Functional‑Connectivity‑Based Model: Adapts a neuroscience concept to capture time‑varying inter‑service metric dependencies without expensive causal inference.
End‑to‑End Detection & Localisation Pipeline (FC‑ADL): Simultaneously flags anomalous behavior and produces a ranked list of likely faulty services.
Scalability Demonstrated on Real‑World Scale: Tested on Alibaba’s massive microservice fabric (tens of thousands of services) with linear‑time performance.
Empirical Superiority: Beats state‑of‑the‑art anomaly detectors and fault‑localisers across a broad set of synthetic and real fault scenarios.
Open‑Source‑Ready Design: Uses only standard metric streams (CPU, latency, request counts) and lightweight graph‑based computations, making integration into existing observability stacks straightforward.

Methodology

Metric Collection – Continuous streams of per‑service telemetry (e.g., latency percentiles, error rates) are ingested.
Sliding‑Window Correlation – For each window, the algorithm computes pairwise Pearson correlations between all service metrics, forming a functional connectivity matrix that reflects how services move together over time.
Change‑Point Detection – The matrix is compared to a baseline (e.g., exponentially weighted moving average). Significant deviations trigger an anomaly flag.
Root‑Cause Scoring – When an anomaly is detected, the method evaluates which nodes (services) contributed most to the matrix change using a simple influence score derived from the magnitude of correlation shifts.
Ranking & Alerting – Services are ranked by influence score; the top‑k are presented to operators as root‑cause candidates.

All steps rely on linear‑time operations (matrix updates are incremental) and avoid combinatorial causal searches, keeping CPU and memory footprints low enough for production use.

Results & Findings

Evaluation	Metric	FC‑ADL	Best Prior Art
Synthetic fault injection (10‑100 services)	Detection F1‑score	0.93	0.78
Real‑world Alibaba trace (≈ 30 k services)	Localization Top‑3 accuracy	0.87	0.61
Throughput impact	CPU overhead per 1 k services	< 2 %	5‑12 %
Latency to raise an alert	Median detection latency	≈ 30 s	120 s

Key takeaways

The functional‑connectivity signal captures subtle, system‑wide drifts that single‑metric thresholds miss.
Localization quality remains high even when multiple services are simultaneously degraded.
The approach scales linearly; adding more services does not explode computation time.

Practical Implications

Plug‑and‑Play Anomaly Service – Teams can drop FC‑ADL into existing Prometheus/Grafana pipelines, leveraging already‑collected metrics.
Faster MTTR – By delivering a ranked list of suspect services within seconds, on‑call engineers can triage incidents more efficiently, reducing mean time to resolution.
Cost‑Effective Observability – No need for heavyweight tracing or distributed causal inference engines, which often require additional instrumentation and storage.
Proactive Capacity Planning – Continuous functional‑connectivity maps can reveal emerging coupling patterns, helping architects refactor overly‑tight service dependencies before they cause outages.
Vendor‑Neutral – Works with any cloud provider or orchestration platform (Kubernetes, Nomad, etc.) as long as metric streams are available.

Limitations & Future Work

Metric Diversity – The current implementation focuses on scalar performance metrics; richer logs or traces are not directly incorporated.
Assumption of Linear Correlation – Pearson correlation may miss non‑linear relationships; future extensions could explore mutual information or kernel‑based measures.
Cold‑Start Baseline – Accurate baselines need a stable observation period; highly volatile workloads may require adaptive baseline strategies.
Root‑Cause Granularity – While FC‑ADL surfaces candidate services, pinpointing the exact code path still needs complementary debugging tools.

The authors suggest exploring hybrid models that fuse functional connectivity with lightweight causal graphs, and extending the framework to handle multi‑tenant environments where metric isolation is a concern.

Authors

Giles Winchester
George Parisis
Luc Berthouze

Paper Information

arXiv ID: 2512.00844v1
Categories: cs.SE, cs.DC, cs.LG
Published: November 30, 2025
PDF: Download PDF

[Paper] FC-ADL: Efficient Microservice Anomaly Detection and Localisation Through Functional Connectivity

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] The Universal Weight Subspace Hypothesis

[Paper] Value Gradient Guidance for Flow Matching Alignment

[Paper] Deep infant brain segmentation from multi-contrast MRI

[Paper] DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation