[Paper] DynaFix: Iterative Automated Program Repair Driven by Execution-Level Dynamic Information

Published: 1 month ago (December 31, 2025 at 12:13 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2512.24635v1

Overview

The paper introduces DynaFix, a new automated program repair (APR) technique that feeds execution‑level runtime data back into large language models (LLMs) during the patch‑generation loop. By mimicking how developers debug—examining variable values, control‑flow paths, and call stacks after each failed attempt—DynaFix achieves a measurable boost in both repair success rate and efficiency on the widely‑used Defects4J benchmark.

Key Contributions

Iterative dynamic feedback loop: Captures fine‑grained runtime information after every patch attempt and injects it as structured prompts for the LLM.
Fine‑grained execution representation: Transforms variable states, control‑flow traces, and call‑stack snapshots into a prompt format that LLMs can reason over.
Empirical gains: Repairs 186 single‑function bugs (≈10 % improvement over the strongest baselines) and fixes 38 bugs that prior APR tools could not.
Search‑space reduction: Limits the number of repair attempts to ≤ 35 per bug and cuts the candidate‑patch space by ~70 % compared with existing iterative APR frameworks.

Methodology

Initial Test Run – Execute the buggy program on its test suite; failing test cases trigger the first data collection.
Dynamic Information Extraction – A lightweight instrumentation layer records:
- Current values of all in‑scope variables
- The exact control‑flow path taken (e.g., which branches were hit)
- The call stack at the point of failure
Prompt Construction – Serialize the collected data into a concise, human‑readable “debug report” appended to the LLM’s repair prompt (e.g., “The variable count was -1 at line 42; the program took the else branch of if (count > 0) …”).
LLM Patch Generation – A code‑capable LLM (e.g., GPT‑4‑code) produces one or more candidate patches guided by the debug report.
Validation & Iteration – Compile and re‑run the candidate patch against the test suite. If it still fails, repeat steps 2‑4 with new runtime data from the latest execution.
Termination – Stop when a patch passes all tests or a pre‑defined attempt limit (35) is reached.

The approach is model‑agnostic: any LLM that can understand the structured prompt can be swapped in, making DynaFix a plug‑and‑play layer on top of existing LLM‑based APR pipelines.

Results & Findings

Metric	DynaFix	Best Prior LLM‑APR
Bugs repaired (Defects4J v1.2 + v2.0)	186	169
New bugs repaired (not fixed by any baseline)	38	0
Avg. attempts per bug (successful cases)	≤ 35	55‑80
Search‑space reduction	~70 %	—
Runtime overhead (instrumentation + prompt generation)	< 2 s per iteration (negligible vs. LLM inference)	—

The authors report that the dynamic prompts dramatically improve the LLM’s “understanding” of why a patch failed, leading to more targeted edits rather than blind trial‑and‑error. Even for complex bugs requiring multiple code changes, DynaFix converges within a handful of iterations.

Practical Implications

Faster CI/CD fixes – Integrating DynaFix into a continuous‑integration pipeline could automatically generate high‑quality patches after a failing build, reducing mean‑time‑to‑repair.
Better debugging assistants – IDE plugins can expose the same execution‑level prompts to developers, turning LLM suggestions into interactive, step‑wise debugging hints.
Lower cost of LLM inference – By pruning the search space early, fewer LLM calls are needed, translating into tangible cost savings for cloud‑based inference services.
Language‑agnostic extension – While evaluated on Java, the instrumentation concept works for any language with a runtime tracer (e.g., Python’s sys.settrace, .NET profilers), opening the door to cross‑language APR tools.

Limitations & Future Work

Instrumentation overhead may be non‑trivial for large, performance‑critical applications; the authors suggest selective tracing as a mitigation.
The current evaluation focuses on single‑function bugs; scaling to multi‑module or system‑wide defects remains an open challenge.
DynaFix relies on the availability of a passing test suite for the “correct” behavior; future work could explore weak oracles (e.g., metamorphic relations) to broaden applicability.

Authors

Zhili Huang
Ling Xu
Chao Liu
Weifeng Sun
Xu Zhang
Yan Lei
Meng Yan
Hongyu Zhang

Paper Information

arXiv ID: 2512.24635v1
Categories: cs.SE, cs.AI
Published: December 31, 2025
PDF: Download PDF

[Paper] DynaFix: Iterative Automated Program Repair Driven by Execution-Level Dynamic Information

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Two Deep Learning Approaches for Automated Segmentation of Left Ventricle in Cine Cardiac MRI

[Paper] Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning

[Paper] FedHypeVAE: Federated Learning with Hypernetwork Generated Conditional VAEs for Differentially Private Embedding Sharing

[Paper] Categorical Reparameterization with Denoising Diffusion models