[Paper] Well Begun is Half Done: Location-Aware and Trace-Guided Iterative Automated Vulnerability Repair

Published: 1 month ago (December 23, 2025 at 04:54 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.20203v1

Overview

This paper presents \sysname, a novel automated vulnerability‑repair system that leverages large language models (LLMs) but goes a step further by telling the model where to patch first and by rating the quality of each generated patch during an iterative repair loop. By focusing on location awareness and a lightweight quality‑assessment metric, the authors achieve markedly higher success rates on real‑world C/C++ bugs than prior neural‑translation, static‑analysis, or LLM‑only approaches.

Key Contributions

Location‑aware patch guidance – a lightweight analysis that ranks code locations needing repair, steering the LLM to edit the most promising spots first.
Trace‑guided iterative repair – an automated loop that evaluates each test‑failing candidate patch on two dimensions (new‑vulnerability introduction & taint‑statement coverage) and selects the best candidate for the next iteration.
Two‑dimensional patch quality metric – combines security safety (no new bugs) with taint coverage to approximate how “complete” a fix is without full manual review.
Empirical validation on VulnLoc+ – a curated dataset of 40 real C/C++ vulnerabilities with their Proof‑of‑Vulnerability (PoV) exploits; \sysname produces 27 plausible patches and repairs 8–13 more bugs than the strongest baselines.
Open‑source prototype – the authors release the implementation and the extended dataset, enabling reproducibility and further research.

Methodology

Pre‑processing & Location Ranking
- Static analysis extracts taint‑propagation graphs from the vulnerable program and its PoV.
- Each statement receives a repair priority score based on how directly it participates in the taint flow that leads to the exploit.
LLM Prompt Construction
- The top‑ranked location(s) are embedded into the prompt as explicit “edit‑here” markers, while the rest of the source code is provided for context.
- A state‑of‑the‑art code‑oriented LLM (e.g., GPT‑4‑code) generates a candidate patch.
Iterative Evaluation Loop
- The candidate is compiled and run against the test suite (including the PoV).
- Two quality signals are computed:
  - Safety – does the patch introduce any new compiler warnings, undefined‑behavior warnings, or new failing tests?
  - Taint Coverage – what fraction of the original taint paths are now blocked?
- The patch with the highest combined score becomes the “seed” for the next iteration; the process repeats until the test suite passes or a max‑iteration budget is hit.
Final Selection
- The best‑scoring patch that makes all tests pass is reported as the plausible fix; manual verification determines whether it is correct (i.e., truly removes the vulnerability without side effects).

Results & Findings

Metric	\sysname	NMT‑based	Program‑analysis	Prior LLM
Plausible patches (out of 40)	27	15–19	12–16	14–18
Correct patches (fully fixed)	13	5–7	4–6	6–8
Avg. iterations per bug	3.2	5.6	7.1	5.9
New‑vulnerability introductions	< 2 %	8 %	12 %	9 %

Key takeaways

Location awareness cuts down the number of wasted LLM generations, leading to fewer iterations and higher success.
The two‑dimensional quality score reliably filters out patches that would otherwise pass the test suite but re‑introduce security issues.
Even with a modest dataset (40 bugs), the gains are statistically significant, suggesting the approach scales to larger corpora.

Practical Implications

Developer tooling – IDE plugins could embed \sysname’s location‑ranking engine to suggest where a developer should focus when fixing a reported CVE, reducing guesswork.
CI/CD pipelines – an automated “repair‑as‑you‑test” stage could attempt a quick fix for newly discovered static‑analysis warnings, automatically submitting a pull request if a high‑quality patch is found.
Bug‑bounty platforms – the system can generate plausible patches for disclosed PoVs, accelerating the verification loop between researchers and vendors.
Security‑oriented code review – the taint‑coverage metric offers a lightweight sanity check that can be added to existing static‑analysis suites without heavy formal verification.

Overall, \sysname demonstrates that guiding LLMs with domain‑specific signals (location + quality feedback) turns a “blind” generation process into a focused, semi‑automated debugging assistant, a pattern that can be replicated for other bug classes (e.g., memory leaks, concurrency bugs).

Limitations & Future Work

Dataset size – evaluation is limited to 40 C/C++ vulnerabilities; broader benchmarks (e.g., Juliet, Defects4J) are needed to confirm generality.
Language dependence – the current implementation relies on taint analysis specific to C/C++; adapting the location‑ranking to managed languages (Java, Python) will require different static analyses.
LLM cost – iterative prompting can be expensive; future work could explore caching, few‑shot fine‑tuning, or smaller specialist models to reduce inference overhead.
Patch correctness verification – the study still depends on manual validation for “correct” patches; integrating formal verification or symbolic execution could automate this step.

By addressing these points, the community can move toward fully autonomous, production‑grade vulnerability repair pipelines.

Authors

Zhenlei Ye
Xiaobing Sun
Sicong Cao
Lili Bo
Bin Li

Paper Information

arXiv ID: 2512.20203v1
Categories: cs.SE
Published: December 23, 2025
PDF: Download PDF

[Paper] Well Begun is Half Done: Location-Aware and Trace-Guided Iterative Automated Vulnerability Repair

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] HALF: Process Hollowing Analysis Framework for Binary Programs with the Assistance of Kernel Modules

[Paper] Analyzing Code Injection Attacks on LLM-based Multi-Agent Systems in Software Development

[Paper] A Story About Cohesion and Separation: Label-Free Metric for Log Parser Evaluation

[Paper] The State of the SBOM Tool Ecosystems: A Comparative Analysis of SPDX and CycloneDX