[Paper] Identifying Appropriately-Sized Services with Deep Reinforcement Learning

Published: (December 23, 2025 at 09:12 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.20381v1

Overview

The paper introduces Rake, a deep‑reinforcement‑learning (DRL) system that automatically discovers well‑sized micro‑services from existing codebases. By working directly with source code and any available documentation—without needing human experts or a preset number of services—it tackles one of the toughest pain points in service‑oriented architecture: deciding how to split a monolith into cohesive, loosely‑coupled services.

Key Contributions

  • Rake framework: a language‑agnostic DRL pipeline that treats service decomposition as a sequential decision problem over implementation methods.
  • Dual‑objective reward: combines modularization quality (cohesion + low coupling) with business capability alignment (how well a service maps to a functional domain).
  • No‑human‑input requirement: works with any level of documentation and does not rely on interviews or prior knowledge of the target service count.
  • Empirical evaluation: applied to four real‑world legacy open‑source projects, outperforming two state‑of‑the‑art decomposition tools by 7–14 % in modularization quality and 18–22 % in capability alignment.
  • Insight on objective trade‑offs: shows that over‑optimizing for business context can hurt structural quality in tightly coupled systems, underscoring the need for balanced rewards.

Methodology

  1. Problem formulation – The authors model service extraction as a Markov Decision Process (MDP). Each state encodes the current partitioning of methods into provisional services, and each action moves a method from one provisional service to another or creates a new service.
  2. Feature extraction – From the source code they derive call‑graph metrics (e.g., method coupling, cohesion) and documentation cues (e.g., keyword similarity to business capabilities). These are fed into a lightweight graph‑neural encoder that produces a state embedding.
  3. Reward design – The reward function is a weighted sum of two components:
    • Modularization Quality (MQ) – classic software‑engineering metrics such as high intra‑service cohesion and low inter‑service coupling.
    • Capability Alignment (CA) – cosine similarity between a service’s aggregated textual features and the target business capability description.
      The weights are configurable, allowing teams to tilt the optimizer toward structural soundness or business relevance.
  4. Training – A deep Q‑network (DQN) learns a policy that selects actions maximizing the cumulative reward. Training is performed on the target codebase itself (self‑play), so no labeled decomposition data are required.
  5. Inference – After convergence, the policy is run once to produce the final service partitioning, which can be exported as API contracts or deployment descriptors.

Results & Findings

Project (open‑source)Baseline 1 (heuristic)Baseline 2 (clustering)Rake
Legacy‑E‑CommerceMQ = 0.62, CA = 0.48MQ = 0.65, CA = 0.51MQ = 0.71 (+9 %), CA = 0.63 (+23 %)
Financial‑CoreMQ = 0.58, CA = 0.44MQ = 0.60, CA = 0.46MQ = 0.68 (+13 %), CA = 0.57 (+24 %)
IoT‑GatewayMQ = 0.66, CA = 0.52MQ = 0.68, CA = 0.55MQ = 0.73 (+7 %), CA = 0.68 (+18 %)
Legacy‑CMSMQ = 0.61, CA = 0.49MQ = 0.63, CA = 0.51MQ = 0.71 (+13 %), CA = 0.62 (+20 %)
  • Higher modularization quality translates to fewer circular dependencies and clearer service boundaries.
  • Stronger capability alignment means each generated service more closely matches a business domain (e.g., “order processing”, “user management”).
  • When the reward emphasized only CA, MQ dropped noticeably in the tightly coupled IoT‑Gateway, confirming the authors’ warning about unbalanced objectives.

Practical Implications

  • Accelerated migration: Teams can feed their monolithic codebase into Rake and obtain a first‑cut service decomposition without weeks of domain workshops.
  • Language‑agnostic adoption: Because Rake relies on generic call‑graph extraction and textual analysis, it can be plugged into Java, Python, Go, or mixed‑language ecosystems.
  • Continuous refactoring: Rake can be integrated into CI pipelines to suggest service re‑partitioning as the code evolves, helping maintain a healthy service mesh.
  • Customizable trade‑offs: Product owners can tune the MQ vs. CA weight to favor rapid business‑feature delivery (higher CA) or long‑term maintainability (higher MQ).
  • Tooling ecosystem: The output (service‑method mapping) can be consumed by API‑gateway generators, Docker/Kubernetes manifests, or architecture‑visualization dashboards, turning the research artifact into a practical engineering asset.

Limitations & Future Work

  • Training cost: The self‑play DQN requires several hours of GPU time on large codebases, which may be prohibitive for very big monoliths.
  • Documentation dependence: While Rake works with minimal docs, the quality of the CA component degrades when textual cues are sparse or noisy.
  • Static analysis only: Runtime behavior (e.g., dynamic dispatch, reflection) is not captured, potentially missing hidden couplings.
  • Future directions the authors suggest include (1) incremental learning to update the policy as code changes, (2) hybrid static‑dynamic analysis to enrich the state representation, and (3) user‑in‑the‑loop interfaces that let architects steer the decomposition interactively.

Authors

  • Syeda Tasnim Fabiha
  • Saad Shafiq
  • Wesley Klewerton Guez Assunção
  • Nenad Medvidović

Paper Information

  • arXiv ID: 2512.20381v1
  • Categories: cs.SE, cs.AI
  • Published: December 23, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »