[Paper] EvoRepair: Enhancing Vulnerability Repair Agents Through Experience-Based Self-Evolution

Published: (May 28, 2026 at 11:46 AM EDT)
5 min read
Source: arXiv

Source: arXiv - 2605.30105v1

Overview

The paper introduces EvoRepair, a novel framework that turns large‑language‑model (LLM)‑based vulnerability repair agents into self‑evolving assistants. By giving the agent a memory of past fixes and a way to refine that memory over time, EvoRepair dramatically reduces repeated mistakes and boosts repair success across multiple benchmarks.

Key Contributions

  • Experience‑based self‑evolution: First AVR system that continuously harvests, scores, and reuses repair “experiences” from earlier bugs.
  • Cyclic learn‑and‑repair loop: Integrates retrieval of relevant past fixes, on‑the‑fly extraction of new knowledge, and quality‑aware updating of an experience bank.
  • Strong empirical gains: Achieves 93.47 % success on PATCHEVAL and 87.00 % on SEC‑bench, outperforming the strongest prior LLM‑based baselines by 30‑70 %.
  • Model‑agnostic robustness: Demonstrates consistent improvements with GPT‑5‑mini, other LLMs, different programming languages, and the VUL4J transfer benchmark.
  • Open‑source‑ready design: The experience bank and scoring mechanisms are modular, making the approach easy to plug into existing LLM‑driven repair pipelines.

Methodology

  1. Experience Bank Construction

    • Each repair attempt generates a trajectory: the prompt, the LLM’s suggested patch, execution feedback, and a quality score (based on test pass/fail, static analysis, etc.).
    • High‑quality trajectories are stored as reusable “experience snippets” indexed by vulnerability type, code context, and fix pattern.
  2. Retrieval‑Guided Repair

    • When a new vulnerability is presented, EvoRepair queries the bank for the most similar past experiences (using embeddings of the vulnerable code and CVE metadata).
    • Retrieved snippets are injected into the LLM prompt as contextual hints, steering the model toward proven fix strategies.
  3. Self‑Evolution Cycle

    • After the LLM proposes a patch, EvoRepair runs the patch through test suites and static analysers.
    • The outcome updates the quality score of the trajectory; successful patches enrich the bank, while failed attempts are down‑weighted or discarded.
    • This loop repeats until the vulnerability is fixed or a timeout is reached, allowing the agent to “learn” from its own successes and failures.
  4. Quality‑Aware Scoring

    • Scores combine functional correctness (test pass), security impact (absence of new warnings), and code quality metrics (lint, cyclomatic complexity).
    • The scoring function ensures that only robust, maintainable fixes are promoted for future reuse.

Results & Findings

BenchmarkEvoRepairNext‑Best LLM Baseline (LoopRepair)Gain
PATCHEVAL93.47 %53.91 %+39.56 %
SEC‑bench87.00 %53.50 %+33.50 %
Overall90.46 %73.48 % (Live‑SWE‑Agent)+6.98 %
  • Cross‑benchmark consistency: EvoRepair’s advantage holds across C/C++ and Java datasets, confirming that the experience bank captures language‑agnostic repair patterns.
  • Transferability: When applied to the VUL4J suite (Java‑only), EvoRepair still outperformed baselines, indicating that the learned experiences generalize beyond the original training set.
  • Error reduction: The same logical mistake (e.g., forgetting to free memory after a buffer overflow fix) appeared in 27 % of baseline runs but dropped to <3 % with EvoRepair, showcasing the benefit of intra‑vulnerability experience accumulation.

Practical Implications

  • Faster Patch Generation: Developers can integrate EvoRepair into CI pipelines to automatically suggest high‑confidence patches, cutting manual triage time by up to 70 % for known vulnerability classes.
  • Reduced Regression Risk: Because the experience bank only promotes patches that pass stringent quality checks, the likelihood of introducing new bugs or security regressions is markedly lower.
  • Continuous Learning in Production: As new vulnerabilities are discovered in the wild, EvoRepair can ingest the fixes directly from the development team, instantly making that knowledge available for future incidents.
  • Tool‑agnostic Plug‑in: The framework’s retrieval and scoring components can be wrapped around any LLM (OpenAI, Anthropic, LLaMA, etc.), enabling existing security‑automation tools to become self‑improving without retraining the underlying model.
  • Compliance & Auditing: The experience bank provides a traceable log of which past fixes influenced a current patch, aiding security audits and regulatory reporting.

Limitations & Future Work

  • Dependence on Quality of Initial Data: The system’s performance hinges on having a sufficiently diverse and correct set of initial repair trajectories; noisy or biased seeds can propagate errors.
  • Scalability of Retrieval: As the experience bank grows, efficient similarity search becomes critical; the paper uses approximate nearest‑neighbor indexing, but real‑world deployments may need more sophisticated caching or hierarchical clustering.
  • Language‑Specific Nuances: While cross‑language gains were demonstrated, certain idioms (e.g., Rust’s ownership model) may require language‑tailored experience representations.
  • Human‑in‑the‑Loop Validation: The current evaluation is fully automated; future work could explore interactive modes where developers approve or edit suggested patches, further enriching the experience bank.
  • Security of the Experience Bank: Storing patches and vulnerability details raises concerns about leakage; future research should investigate encrypted or federated storage mechanisms.

EvoRepair shows that giving LLM‑driven security agents a memory—and a disciplined way to update it—can turn a one‑shot fixer into a continuously improving defender. For teams looking to automate vulnerability remediation at scale, the framework offers a practical pathway to smarter, safer code.

Authors

  • Haichuan Hu
  • Guoqing Xie
  • Quanjun Zhang
  • Jiawei Liu
  • Shengcheng Yu
  • Chunrong Fang
  • Zhenyu Chen
  • Liang Xiao

Paper Information

  • arXiv ID: 2605.30105v1
  • Categories: cs.SE
  • Published: May 28, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »