[Paper] Learner-Tailored Program Repair: A Solution Generator with Iterative Edit-Driven Retrieval Enhancement
Source: arXiv - 2601.08545v1
Overview
The paper introduces Learner‑Tailored Program Repair (LPR), a new task that goes beyond simply fixing buggy code—it also explains why the bug occurred, making it ideal for intelligent programming coaching systems. To tackle LPR, the authors present \textsc{Learner‑Tailored Solution Generator}, a two‑stage framework that combines retrieval of similar past fixes with large‑language‑model (LLM) reasoning, and iteratively refines its search based on execution feedback.
Key Contributions
- New task definition (LPR): Repairs code and generates human‑readable bug explanations tailored to learners.
- Edit‑driven retrieval engine: Builds a searchable database of prior solutions and retrieves the most relevant ones based on the edits needed to fix a bug.
- Solution‑guided repair: Uses the retrieved snippets as concrete guidance for an LLM to produce a corrected program and an explanatory narrative.
- Iterative Retrieval Enhancement (IRE): After an initial repair attempt, execution results are fed back to steer the retrieval process toward better candidate solutions, effectively “learning” from its own mistakes.
- Empirical validation: Shows substantial gains over strong baselines on benchmark datasets of student code, confirming the practicality of the approach.
Methodology
-
Solution Retrieval Database Construction
- Collect a large corpus of correct programs and their associated edit scripts (the diff between buggy and fixed versions).
- Index these edit scripts so that, given a new buggy snippet, the system can quickly find past fixes that involve similar edits.
-
Edit‑Driven Retrieval (Stage 1)
- When a learner submits buggy code, the system extracts a minimal set of syntactic edits that would resolve the failure (e.g., “add a missing
return”, “change==to===”). - These edits are used as a query to pull the top‑k most similar past solutions from the database.
- When a learner submits buggy code, the system extracts a minimal set of syntactic edits that would resolve the failure (e.g., “add a missing
-
Solution‑Guided Repair (Stage 2)
- The retrieved solutions are fed to a powerful LLM (e.g., GPT‑4) together with the original buggy code.
- The LLM generates:
- a repaired version of the code,
- a concise, learner‑friendly explanation of the bug’s root cause.
-
Iterative Retrieval Enhancement
- The repaired code is executed against hidden test cases.
- Failure signals (e.g., which test case still fails, error messages) are transformed into new edit queries, prompting another round of retrieval.
- This loop repeats until the code passes or a budget limit is reached, allowing the system to “self‑correct” its retrieval direction.
The pipeline is fully automated, requiring only the buggy submission and a test harness.
Results & Findings
- Accuracy boost: The proposed framework achieves +30%–45% higher pass rates on standard student‑code benchmarks compared with vanilla LLM repair or retrieval‑only baselines.
- Explanation quality: Human evaluators rated the generated bug explanations as clear and educational in >80% of cases, a notable improvement over prior methods that output only patches.
- Iterative gains: Adding the IRE loop yields an extra 10%–15% increase in successful repairs, demonstrating that feedback‑driven retrieval is effective.
- Speed: Despite the two‑stage design, average end‑to‑end latency stays under 5 seconds per submission, making it viable for real‑time tutoring tools.
Practical Implications
- Intelligent tutoring systems (ITS): Deploying this framework can turn a simple “auto‑grader” into a coach that not only tells students their code is wrong but also explains the conceptual mistake.
- Developer onboarding tools: New hires can paste failing snippets into a chat‑assistant that returns a fixed version and a short lesson on the underlying pattern (e.g., off‑by‑one errors).
- Code review bots: In CI pipelines, the system could automatically suggest patches and annotate the change with a rationale, reducing back‑and‑forth between reviewers and authors.
- Educational content generation: By mining the retrieval database, instructors can automatically assemble collections of common bug patterns and their fixes for curriculum design.
Overall, the approach bridges the gap between raw code correction and pedagogical feedback, aligning AI‑driven repair with how human mentors teach.
Limitations & Future Work
- Dependence on a high‑quality solution corpus: Retrieval effectiveness drops if the database lacks diverse edit examples for a given language or domain.
- Scalability to large codebases: The current edit‑driven indexing works best on relatively small, self‑contained functions typical of student assignments; extending to multi‑file projects may require hierarchical retrieval.
- Explainability of LLM reasoning: While the generated explanations are readable, the internal decision path of the LLM remains a black box; future work could integrate more transparent reasoning modules.
- Cross‑language generalization: Experiments focus on a single programming language (Python); adapting the pipeline to statically typed languages (Java, C++) is an open research direction.
The authors suggest enriching the retrieval database with community‑sourced patches and exploring hybrid symbolic‑LLM methods to further boost both repair accuracy and explanatory depth.
Authors
- Zhenlong Dai
- Zhuoluo Zhao
- Hengning Wang
- Xiu Tang
- Sai Wu
- Chang Yao
- Zhipeng Gao
- Jingyuan Chen
Paper Information
- arXiv ID: 2601.08545v1
- Categories: cs.AI, cs.CL, cs.SE
- Published: January 13, 2026
- PDF: Download PDF