[Paper] Well Begun is Half Done: Location-Aware and Trace-Guided Iterative Automated Vulnerability Repair

Published: (December 23, 2025 at 04:54 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.20203v1

Overview

This paper presents \sysname, a novel automated vulnerability‑repair system that leverages large language models (LLMs) but goes a step further by telling the model where to patch first and by rating the quality of each generated patch during an iterative repair loop. By focusing on location awareness and a lightweight quality‑assessment metric, the authors achieve markedly higher success rates on real‑world C/C++ bugs than prior neural‑translation, static‑analysis, or LLM‑only approaches.

Key Contributions

  • Location‑aware patch guidance – a lightweight analysis that ranks code locations needing repair, steering the LLM to edit the most promising spots first.
  • Trace‑guided iterative repair – an automated loop that evaluates each test‑failing candidate patch on two dimensions (new‑vulnerability introduction & taint‑statement coverage) and selects the best candidate for the next iteration.
  • Two‑dimensional patch quality metric – combines security safety (no new bugs) with taint coverage to approximate how “complete” a fix is without full manual review.
  • Empirical validation on VulnLoc+ – a curated dataset of 40 real C/C++ vulnerabilities with their Proof‑of‑Vulnerability (PoV) exploits; \sysname produces 27 plausible patches and repairs 8–13 more bugs than the strongest baselines.
  • Open‑source prototype – the authors release the implementation and the extended dataset, enabling reproducibility and further research.

Methodology

  1. Pre‑processing & Location Ranking
    • Static analysis extracts taint‑propagation graphs from the vulnerable program and its PoV.
    • Each statement receives a repair priority score based on how directly it participates in the taint flow that leads to the exploit.
  2. LLM Prompt Construction
    • The top‑ranked location(s) are embedded into the prompt as explicit “edit‑here” markers, while the rest of the source code is provided for context.
    • A state‑of‑the‑art code‑oriented LLM (e.g., GPT‑4‑code) generates a candidate patch.
  3. Iterative Evaluation Loop
    • The candidate is compiled and run against the test suite (including the PoV).
    • Two quality signals are computed:
      • Safety – does the patch introduce any new compiler warnings, undefined‑behavior warnings, or new failing tests?
      • Taint Coverage – what fraction of the original taint paths are now blocked?
    • The patch with the highest combined score becomes the “seed” for the next iteration; the process repeats until the test suite passes or a max‑iteration budget is hit.
  4. Final Selection
    • The best‑scoring patch that makes all tests pass is reported as the plausible fix; manual verification determines whether it is correct (i.e., truly removes the vulnerability without side effects).

Results & Findings

Metric\sysnameNMT‑basedProgram‑analysisPrior LLM
Plausible patches (out of 40)2715–1912–1614–18
Correct patches (fully fixed)135–74–66–8
Avg. iterations per bug3.25.67.15.9
New‑vulnerability introductions< 2 %8 %12 %9 %

Key takeaways

  • Location awareness cuts down the number of wasted LLM generations, leading to fewer iterations and higher success.
  • The two‑dimensional quality score reliably filters out patches that would otherwise pass the test suite but re‑introduce security issues.
  • Even with a modest dataset (40 bugs), the gains are statistically significant, suggesting the approach scales to larger corpora.

Practical Implications

  • Developer tooling – IDE plugins could embed \sysname’s location‑ranking engine to suggest where a developer should focus when fixing a reported CVE, reducing guesswork.
  • CI/CD pipelines – an automated “repair‑as‑you‑test” stage could attempt a quick fix for newly discovered static‑analysis warnings, automatically submitting a pull request if a high‑quality patch is found.
  • Bug‑bounty platforms – the system can generate plausible patches for disclosed PoVs, accelerating the verification loop between researchers and vendors.
  • Security‑oriented code review – the taint‑coverage metric offers a lightweight sanity check that can be added to existing static‑analysis suites without heavy formal verification.

Overall, \sysname demonstrates that guiding LLMs with domain‑specific signals (location + quality feedback) turns a “blind” generation process into a focused, semi‑automated debugging assistant, a pattern that can be replicated for other bug classes (e.g., memory leaks, concurrency bugs).

Limitations & Future Work

  • Dataset size – evaluation is limited to 40 C/C++ vulnerabilities; broader benchmarks (e.g., Juliet, Defects4J) are needed to confirm generality.
  • Language dependence – the current implementation relies on taint analysis specific to C/C++; adapting the location‑ranking to managed languages (Java, Python) will require different static analyses.
  • LLM cost – iterative prompting can be expensive; future work could explore caching, few‑shot fine‑tuning, or smaller specialist models to reduce inference overhead.
  • Patch correctness verification – the study still depends on manual validation for “correct” patches; integrating formal verification or symbolic execution could automate this step.

By addressing these points, the community can move toward fully autonomous, production‑grade vulnerability repair pipelines.

Authors

  • Zhenlei Ye
  • Xiaobing Sun
  • Sicong Cao
  • Lili Bo
  • Bin Li

Paper Information

  • arXiv ID: 2512.20203v1
  • Categories: cs.SE
  • Published: December 23, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »