[Paper] Systematic Detection of Energy Regression and Corresponding Code Patterns in Java Projects
Source: arXiv - 2604.19373v1
Overview
The paper presents EnergyTrackr, an automated technique that spots energy‑regression bugs in Java projects by analysing commit‑level power measurements. By flagging statistically significant energy spikes and linking them to recurring code patterns, the authors aim to give developers a practical tool for continuous green‑software monitoring.
Key Contributions
- Commit‑level regression detection: A statistical pipeline that identifies energy regressions across thousands of commits without manual profiling.
- Pattern mining for anti‑patterns: Automatic extraction of code change patterns (e.g., missing early exits, heavyweight dependency upgrades) that are strongly correlated with energy spikes.
- Large‑scale empirical study: Evaluation on 3,232 commits from three real‑world Java repositories, demonstrating the approach’s precision and recall.
- Open‑source prototype: A publicly released implementation (EnergyTrackr) that can be integrated into CI pipelines.
Methodology
- Data collection – The authors instrumented the target Java projects to run a representative benchmark suite for each commit, measuring total energy consumption with a high‑resolution power meter.
- Statistical detection – For every commit, EnergyTrackr computes the mean energy usage and applies a two‑sample t‑test (or non‑parametric alternative) against a sliding window of previous commits. A commit is flagged when the p‑value falls below a configurable threshold (default 0.01).
- Code‑change extraction – The flagged commits are parsed with a Java AST parser. The system extracts fine‑grained edit operations (add/delete/modify statements, method calls, dependency version changes).
- Pattern mining – Using frequent pattern mining (FP‑Growth) on the edit‑operation vectors, the authors surface recurring “energy‑anti‑patterns”. Each pattern is scored by its support (how often it appears) and confidence (how strongly it correlates with a regression).
- Validation – A manual inspection of a random sample of flagged commits confirms whether the identified pattern truly explains the energy increase.
The pipeline is deliberately lightweight: it runs on commodity hardware, needs only a benchmark script, and can be scheduled as part of a nightly build.
Results & Findings
| Metric | Value |
|---|---|
| Precision (energy regressions correctly flagged) | 0.78 |
| Recall (regressions detected out of all true regressions) | 0.71 |
| Top anti‑patterns | 1️⃣ Missing early‑exit (return/break) in loops 2️⃣ Introduction of eager collection materialisation (e.g., stream().collect()) 3️⃣ Upgrading a library to a newer, more CPU‑intensive version |
| Average detection latency | 1 commit (the offending commit is usually the one flagged) |
The authors also report that in 62 % of the flagged commits, the identified pattern matched the developers’ own post‑mortem explanations, confirming the practical relevance of the mined patterns.
Practical Implications
- CI/CD integration – EnergyTrackr can be added as a post‑test step that automatically fails a build when a statistically significant energy regression is detected, prompting an immediate review.
- Guided refactoring – By surfacing concrete anti‑patterns, developers receive actionable hints (e.g., “add early exit in this loop” or “avoid eager collection”) rather than a vague “energy regression”.
- Dependency management – The tool highlights costly library upgrades, encouraging teams to benchmark new versions before committing them.
- Technical debt visibility – Energy regressions become a first‑class metric alongside performance and security, helping product owners quantify “green debt”.
- Cross‑project learning – Since the pattern mining works across repositories, organizations can build a shared catalogue of energy anti‑patterns specific to their tech stack.
Limitations & Future Work
- Benchmark dependence – The detection quality hinges on the representativeness of the benchmark suite; poorly chosen workloads may miss regressions that appear only under real traffic.
- Language scope – The current prototype only supports Java; extending to other JVM languages or native code would broaden applicability.
- Granularity – Energy changes caused by external factors (e.g., OS scheduling, hardware variability) can produce false positives; the authors suggest tighter hardware control or statistical smoothing.
- Pattern expressiveness – The mined patterns are limited to syntactic edits; future work could incorporate semantic information (e.g., data‑flow analysis) to capture more subtle energy‑impacting changes.
Authors
- François Bechet
- Jérôme Maquoi
- Luís Cruz
- Benoît Vanderose
- Xavier Devroey
Paper Information
- arXiv ID: 2604.19373v1
- Categories: cs.SE
- Published: April 21, 2026
- PDF: Download PDF