[Paper] From Monolith to Microservices: A Comparative Evaluation of Decomposition Frameworks

Published: 3 months ago (January 30, 2026 at 11:28 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.23141v1

Overview

The paper “From Monolith to Microservices: A Comparative Evaluation of Decomposition Frameworks” tackles one of the most painful steps in modernizing legacy systems: automatically carving a monolithic codebase into well‑defined microservices. By rigorously benchmarking a wide range of static, dynamic, and hybrid decomposition tools on common open‑source applications, the authors provide the first head‑to‑head comparison that developers can actually trust when choosing a migration strategy.

Key Contributions

Unified evaluation pipeline – a reproducible metric‑computation framework that normalizes results across disparate studies.
Comprehensive benchmark suite – four widely‑used reference applications (JPetStore, AcmeAir, DayTrader, Plants) covering different domains and code complexities.
Multi‑dimensional quality metrics – Structural Modularity (SM), Interface Number (IFN), Inter‑partition Communication (ICP), Non‑Extreme Distribution (NED), plus derived indicators for balance and coupling.
Empirical ranking of state‑of‑the‑art techniques – static analysis, runtime tracing, and hybrid approaches are all evaluated side‑by‑side.
Practical recommendation – hierarchical clustering, especially the HDBScan algorithm, consistently yields the most balanced service partitions.

Methodology

Tool selection – The authors gathered all publicly available microservice decomposition frameworks that fall into three categories:
- Static: rely solely on source‑code structure (e.g., dependency graphs).
- Dynamic: use runtime traces (e.g., method call logs).
- Hybrid: combine static and dynamic information.
Benchmark preparation – Each of the four applications was containerized and instrumented to collect the necessary static and dynamic artefacts.
Metric pipeline – A custom script ingests the raw output of each framework and computes the five core quality metrics. This eliminates the “apples‑to‑oranges” problem that has plagued prior comparisons.
Reproduction & augmentation – Where prior papers reported results, the authors re‑ran the tools using the authors’ replication packages to verify numbers and fill gaps.
Statistical analysis – Pairwise comparisons and effect‑size calculations identify which techniques are significantly better on each metric.

Results & Findings

Technique	SM (higher = better)	IFN (lower = better)	ICP (lower = better)	NED (closer to 0.5 = balanced)
HDBScan (hierarchical clustering)	★★★★★	★★★★☆	★★★★☆	★★★★★
Other hierarchical methods (e.g., Agglomerative)	★★★★☆	★★★★☆	★★★★☆	★★★★☆
Pure static graph‑based	★★★☆☆	★★☆☆☆	★★☆☆☆	★★☆☆☆
Pure dynamic trace‑based	★★★☆☆	★★☆☆☆	★★☆☆☆	★★☆☆☆
Hybrid (simple fusion)	★★★★☆	★★★☆☆	★★★☆☆	★★★★☆

Balanced partitions: HDBScan consistently produced service groups with similar sizes, avoiding “tiny” or “monster” services.
Modularity vs. communication trade‑off: While static‑only methods achieved decent modularity, they suffered from high inter‑service call volume (ICP).
Interface overhead: Hierarchical clustering kept the number of exposed interfaces low, simplifying API contracts.

In short, the data shows that hierarchical clustering—particularly density‑based HDBScan—delivers the best overall trade‑off across the evaluated benchmarks.

Practical Implications

Tool selection: Teams planning a monolith‑to‑microservice migration can prioritize frameworks that implement HDBScan or similar density‑based clustering, expecting fewer cross‑service calls and cleaner APIs.
Cost estimation: Lower ICP and IFN translate directly into reduced network latency, fewer integration tests, and simpler DevOps pipelines.
Incremental migration: Because HDBScan yields balanced service sizes, developers can adopt a phased rollout (e.g., “one service per sprint”) without hitting bottlenecks caused by oversized services.
Automation confidence: The unified metric pipeline can be repurposed as an internal quality gate—run after each decomposition iteration to verify that modularity and communication metrics stay within target thresholds.
Vendor evaluation: When evaluating commercial microservice extraction platforms, ask for evidence of hierarchical clustering under the hood; the paper provides a concrete benchmark to compare against.

Limitations & Future Work

Benchmark scope: Only four open‑source applications were used; industrial codebases with millions of lines and heterogeneous tech stacks may exhibit different behavior.
Metric completeness: The chosen metrics capture structural quality but not runtime performance (e.g., latency under load) or operational concerns like data consistency.
Tool ecosystem: Some newer decomposition frameworks lacked publicly available replication packages, so they were omitted.
Future directions: Extending the benchmark to include large‑scale enterprise systems, adding performance‑centric metrics (e.g., request latency, scaling cost), and exploring AI‑driven hybrid approaches are natural next steps.

Authors

Mineth Weerasinghe
Himindu Kularathne
Methmini Madhushika
Danuka Lakshan
Nisansa de Silva
Adeesha Wijayasiri
Srinath Perera

Paper Information

arXiv ID: 2601.23141v1
Categories: cs.SE
Published: January 30, 2026
PDF: Download PDF

[Paper] From Monolith to Microservices: A Comparative Evaluation of Decomposition Frameworks

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Outcome-Conditioned Reasoning Distillation for Resolving Software Issues

[Paper] GrepRAG: An Empirical Study and Optimization of Grep-Like Retrieval for Code Completion

[Paper] Do Good, Stay Longer? Temporal Patterns and Predictors of Newcomer-to-Core Transitions in Conventional OSS and OSS4SG

[Paper] Automated Testing of Prevalent 3D User Interactions in Virtual Reality Applications