[Paper] SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs

Published: 2 months ago (December 10, 2025 at 06:28 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2512.09543v1

Overview

The paper SWEnergy investigates how well current autonomous‑agent frameworks for software‑issue resolution work when they are forced to use small language models (SLMs) instead of the massive, proprietary LLMs they were built for. By measuring energy use, runtime, token consumption, and memory on a standard benchmark, the authors reveal that many of these frameworks waste a lot of compute without actually solving problems.

Key Contributions

Empirical comparison of four popular agentic frameworks (SWE‑Agent, OpenHands, Mini SWE Agent, AutoCodeRover) when run with two SLMs (Gemma‑3 4B and Qwen‑3 1.7B).
Energy‑efficiency profiling on fixed hardware (energy, duration, token count, memory) across 150 runs per configuration.
Identification of the primary bottleneck: framework architecture drives energy consumption far more than the underlying model size.
Evidence of “wasted reasoning” – most energy is spent in unproductive loops, leading to near‑zero task‑completion rates.
Guidelines for low‑energy designs, suggesting a shift from passive orchestration to active management of SLM weaknesses.

Methodology

Benchmark selection – The authors used the SWE‑bench Verified Mini suite, a curated set of realistic software‑bug‑fix and code‑generation tasks.
Framework & model matrix – Each of the four frameworks was paired with each of the two SLMs, yielding eight configurations.
Controlled environment – All experiments ran on identical hardware (CPU‑only, fixed RAM) to isolate software‑level differences.
Instrumentation – Energy draw was captured via a power meter, while runtime, token usage, and memory footprints were logged automatically.
Repetition – 150 independent runs per configuration ensured statistical significance and mitigated stochastic variance.
Success metric – A task was considered solved if the generated patch passed all verification tests in the benchmark.

Results & Findings

Framework (SLM)	Avg. Energy (× baseline)	Success Rate	Main Observation
AutoCodeRover (Gemma‑3)	9.4×	≈0 %	Highest energy waste; many idle reasoning cycles.
SWE‑Agent (Qwen‑3)	6.2×	≈0 %	Energy dominated by repeated prompting.
Mini SWE Agent (Gemma‑3)	4.8×	≈0 %	Slightly better but still inefficient.
OpenHands (Gemma‑3)	1.0× (baseline)	≈0 %	Lowest energy; still fails to solve tasks.

Energy vs. Architecture: The same SLM consumed up to 9.4× more energy depending solely on the surrounding framework.
Success near zero: Regardless of energy spent, all configurations failed to resolve the majority of tasks, confirming that SLM reasoning capacity—not just orchestration—limits success.
Token & Memory: Higher‑energy frameworks also generated more tokens and used more memory, reinforcing the “busy‑work” pattern.

Practical Implications

Don’t assume plug‑and‑play: Swapping a powerful LLM for an SLM in existing agentic pipelines can dramatically increase power bills while delivering no functional gain.
Framework choice matters: For edge devices or on‑premise CI/CD bots where energy is a premium, lightweight orchestrators like OpenHands (or custom minimal loops) are preferable.
Design for SLM limits: Architects should embed active error detection, early termination, and fallback strategies (e.g., hybrid LLM calls) to avoid endless reasoning loops.
Cost‑aware CI: Teams can use the paper’s profiling methodology to benchmark their own agents, ensuring that any energy savings from smaller models aren’t offset by bloated orchestration.
Potential for hybrid solutions: A small model could handle cheap, repetitive tasks (e.g., linting, template generation) while a larger model is invoked only when the SLM signals uncertainty.

Limitations & Future Work

Hardware scope: Experiments were limited to CPU‑only machines; GPU‑accelerated SLMs might exhibit different energy profiles.
Benchmark diversity: Only the SWE‑bench Verified Mini suite was used; broader software‑engineering tasks (e.g., documentation, design) remain untested.
Model selection: The study focused on two SLMs; newer open‑source models (e.g., Llama‑3, Mistral‑7B) could behave differently.
Framework evolution: All four frameworks were evaluated in their current releases; future versions may incorporate SLM‑aware optimizations.

The authors suggest exploring adaptive orchestration—frameworks that monitor SLM confidence and dynamically switch to more capable models or terminate early—to turn the observed energy waste into a tractable, low‑power solution.

Authors

Arihant Tripathy
Ch Pavan Harshit
Karthik Vaidhyanathan

Paper Information

arXiv ID: 2512.09543v1
Categories: cs.SE, cs.AI
Published: December 10, 2025
PDF: Download PDF

[Paper] SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Particulate: Feed-Forward 3D Object Articulation

[Paper] A General Algorithm for Detecting Higher-Order Interactions via Random Sequential Additions

[Paper] Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective

[Paper] Super Suffixes: Bypassing Text Generation Alignment and Guard Models Simultaneously