[Paper] SWE-Fuse: Empowering Software Agents via Issue-free Trajectory Learning and Entropy-aware RLVR Training
Source: arXiv - 2603.07927v1
Overview
The paper SWE‑Fuse tackles a surprisingly common roadblock for large‑language‑model (LLM)‑driven software‑engineering agents: many real‑world issue reports are noisy, vague, or outright mismatched with the code changes that actually fix the problem. By teaching agents to ignore misleading issue text when it’s helpful and to lean on it when it’s reliable, SWE‑Fuse dramatically boosts the success rate of automated bug‑fixing on the challenging SWE‑bench Verified benchmark.
Key Contributions
- Issue‑description‑aware training framework that blends issue‑guided and issue‑free examples, letting the model learn when to trust the bug report and when to rely on pure code‑level reasoning.
- Issue‑free‑driven trajectory learning module that constructs step‑by‑step debugging “trajectories” without relying on the issue description, reducing the impact of noisy inputs.
- Entropy‑aware RLVR (Reinforcement Learning with Value‑based Regularization) training that dynamically adjusts clipping thresholds based on the model’s prediction entropy, encouraging exploration on uncertain samples and stability on confident ones.
- State‑of‑the‑art empirical gains: 43 % (8B) and 60 % (32B) absolute improvements in solve rate over the strongest baselines, with further lifts when combined with test‑time scaling (TTS).
Methodology
-
Data Fusion – The authors start from two pools of training data:
- Issue‑guided samples that include the original bug report (often noisy).
- Issue‑free samples that strip the description, leaving only the code context and the correct fix.
-
Trajectory Learning – For issue‑free samples, they generate a sequence of intermediate debugging steps (e.g., “run tests → locate failing test → inspect stack trace → apply patch”). The model is trained to reproduce this trajectory, learning a procedural debugging mindset that does not depend on textual issue cues.
-
Entropy‑aware RLVR – During RL‑style fine‑tuning, the loss clipping factor is modulated by the model’s output entropy:
- High entropy → looser clipping → the agent can explore diverse actions (useful when the issue description is ambiguous).
- Low entropy → tighter clipping → the agent’s confident predictions are preserved, preventing destabilizing updates.
-
Training Loop – Both modules are interleaved: the model alternates between learning from issue‑free trajectories and from issue‑guided examples, with the entropy‑aware RLVR loss applied throughout.
-
Evaluation – Performance is measured on SWE‑bench Verified, a benchmark of real‑world GitHub issues where the ground‑truth fix is known. The authors also test a test‑time scaling (TTS) wrapper that runs multiple model instances and aggregates their outputs.
Results & Findings
| Model | Baseline Solve Rate | SWE‑Fuse Solve Rate | Δ (absolute) |
|---|---|---|---|
| 8B LLM | ~12 % | 49.8 % (with TTS) | +37.8 % |
| 32B LLM | ~15 % | 65.2 % (with TTS) | +50.2 % |
- Without TTS, SWE‑Fuse alone already outperforms the best prior 8B/32B baselines by 43 % and 60 % respectively.
- The entropy‑aware clipping is the primary driver of stability: training variance drops by ~30 % compared to a fixed‑clip RLVR baseline.
- Ablation studies show that removing either the issue‑free trajectory module or the entropy‑aware component reduces solve rates by 15–20 %, confirming that both are essential.
Practical Implications
- More reliable AI‑powered bug‑fixers – Developers can integrate SWE‑Fuse‑trained agents into CI pipelines, expecting fewer false positives caused by ambiguous tickets.
- Reduced data curation overhead – Since the framework learns from issue‑free trajectories, teams don’t need to painstakingly clean every bug report; the model can self‑correct noisy inputs.
- Scalable to larger models – The entropy‑aware RLVR technique works with both 8B and 32B models, suggesting it can be applied to even bigger LLMs used in enterprise settings.
- Test‑time scaling synergy – Combining SWE‑Fuse with lightweight ensemble tricks (TTS) yields near‑state‑of‑the‑art performance without retraining, a practical win for organizations that already run multiple model instances.
Limitations & Future Work
- Dependence on high‑quality trajectory generation – The issue‑free trajectories are handcrafted or derived from existing patches; scaling this to massive codebases may require automated trajectory synthesis.
- Benchmark scope – SWE‑bench Verified focuses on open‑source GitHub issues; performance on proprietary, domain‑specific bug reports (e.g., embedded systems) remains untested.
- Entropy hyper‑parameters – The clipping schedule is manually tuned; future work could explore meta‑learning or adaptive schedules that generalize across tasks.
- Integration with other modalities – Extending the framework to incorporate stack traces, logs, or execution traces could further improve robustness against noisy issue texts.
Authors
- Xin-Cheng Wen
- Binbin Chen
- Haoxuan Lan
- Hang Yu
- Peng Di
- Cuiyun Gao
Paper Information
- arXiv ID: 2603.07927v1
- Categories: cs.SE, cs.AI
- Published: March 9, 2026
- PDF: Download PDF