[Paper] Automated Classification of Source Code Changes Based on Metrics Clustering in the Software Development Process

Published: (February 16, 2026 at 04:47 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.14591v1

Overview

The paper introduces an automated pipeline that groups individual source‑code changes into meaningful categories by clustering metric vectors derived from each change. By letting a simple k‑means step handle the heavy lifting, developers can cut down the manual effort required for code‑review classification while still keeping expert oversight for the final mapping to business‑relevant change types.

Key Contributions

  • Metric‑driven change representation: Defines an 11‑dimensional vector (LOC, cyclomatic complexity, file count, interface modifications, structural alterations, etc.) for every commit.
  • Two‑phase classification workflow:
    1. Automatic clustering of change vectors using k‑means with cosine similarity.
    2. Expert‑guided labeling of clusters to pre‑defined change classes (e.g., bug‑fix, feature, refactor).
  • Empirical validation on five real‑world systems (including Subversion and NHibernate) showing a purity of 0.75 ± 0.05 and entropy of 0.37 ± 0.06 (p = 0.05).
  • Demonstrated time savings in the code‑review process by automating the distribution step.

Methodology

  1. Metric Extraction: For each commit, the authors compute eleven static‑analysis metrics that capture size, complexity, and structural impact.
  2. Vector Construction: The metrics are normalized and concatenated into a single feature vector representing the change.
  3. Clustering:
    • Algorithm: Standard k‑means.
    • Distance Measure: Cosine similarity, which emphasizes the direction of change vectors rather than absolute magnitude—useful when commits vary widely in size.
    • Number of Clusters (k): Determined experimentally per project to balance granularity and interpretability.
  4. Expert Mapping: A domain expert inspects each cluster’s centroid and assigns it to a high‑level change class (e.g., “bug fix”, “performance tweak”). This step is performed once per project, after which new commits are automatically placed into the existing clusters.

Results & Findings

ProjectPurity (P₍C₎)Entropy (E₍C₎)
Subversion0.780.34
NHibernate0.730.39
… (3 other systems)0.74 ± 0.050.37 ± 0.06
  • Purity ≈ 0.75 indicates that three‑quarters of the commits in a cluster belong to the same expert‑defined class.
  • Entropy ≈ 0.37 shows moderate disorder; lower values would mean tighter clusters, but the achieved level is acceptable for a semi‑automated workflow.
  • Statistical significance (α = 0.05) confirms that the clustering results are not due to random chance.

The authors also report a 30‑40 % reduction in manual effort for change classification during code‑review sessions.

Practical Implications

  • Faster Code Review: Teams can automatically route incoming commits to the appropriate review queue (bug‑fix vs. feature vs. refactor) without manually inspecting every change.
  • Improved Metrics Dashboards: By aggregating classified changes, product managers gain clearer insight into development velocity, technical debt growth, and maintenance effort.
  • Continuous Integration (CI) Hooks: The clustering step can be embedded as a lightweight pre‑commit or post‑commit hook, tagging PRs with a change‑type label that downstream tools (e.g., static analysis, test selection) can consume.
  • Scalable Knowledge Transfer: New team members inherit the expert‑defined mapping without needing to learn the full taxonomy from scratch.
  • Potential for Automation of the Mapping Phase: With enough labeled data, the expert‑mapping step could be replaced by a supervised classifier, moving the whole pipeline toward full automation.

Limitations & Future Work

  • Dependence on Expert Mapping: The current workflow still requires a human to interpret clusters, which may become a bottleneck for very large or rapidly evolving codebases.
  • Metric Set Fixed to 11 Features: While these metrics capture many change aspects, they may miss domain‑specific signals (e.g., UI layout changes, database schema migrations).
  • Static k‑means Parameters: The number of clusters is chosen heuristically; adaptive methods (e.g., DBSCAN, hierarchical clustering) could better handle heterogeneous projects.
  • Generalizability: Validation was performed on five systems; broader industrial studies are needed to confirm scalability across languages, architectures, and development cultures.

Future research directions suggested by the authors include: (1) training a supervised model on the expert‑labeled clusters to eliminate the manual mapping step, (2) exploring richer metric suites (including semantic diffs and runtime profiling data), and (3) integrating the approach with modern DevOps pipelines for real‑time change classification.

Authors

  • Evgenii Kniazev

Paper Information

  • arXiv ID: 2602.14591v1
  • Categories: cs.SE, cs.AI
  • Published: February 16, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »