[Paper] SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models
Source: arXiv - 2605.06209v1
Overview
The paper introduces SiblingRepair, a new automated program repair (APR) technique that leverages large language models (LLMs) to fix multiple related bugs (multi‑hunk bugs) across a codebase. By detecting “sibling” code fragments—pieces that implement similar functionality and therefore tend to share the same defect—the system can generate consistent patches in a single pass, dramatically improving repair success over prior state‑of‑the‑art tools like Hercules.
Key Contributions
- LLM‑driven sibling discovery – Uses token‑level and embedding‑based similarity instead of rigid AST matching or commit‑history heuristics.
- Two complementary repair strategies – Simultaneous repair (jointly patches all siblings) and Iterative repair (progressively refines patches as more siblings are examined).
- Patch generalization across locations – Retains promising patches from earlier suspicious spots and merges them into a unified multi‑hunk fix.
- Empirical superiority – Outperforms existing multi‑hunk APR tools on the Defects4J and GHRB benchmarks, with higher repair rates and comparable runtime.
- Robustness to data leakage – Shows that the LLM’s training data does not materially inflate the reported repair success.
Methodology
- Fault Localization – Starts with a conventional spectrum‑based technique to flag a suspicious line (the seed).
- Sibling Candidate Retrieval
- Token‑based matching: Finds code snippets that share a high proportion of lexical tokens with the seed.
- Embedding‑based matching: Uses a code‑embedding model (e.g., CodeBERT) to capture semantic similarity, surfacing siblings that may look different syntactically but behave alike.
- LLM‑guided Filtering – A large language model (e.g., GPT‑4) evaluates each candidate, discarding those unlikely to be related to the failure.
- Patch Generation
- Simultaneous Repair: The LLM receives the seed and all filtered siblings together and is asked to produce a single consistent edit that applies to every location.
- Iterative Repair: The LLM processes siblings one by one, updating a shared “patch context” so later edits stay compatible with earlier ones.
- Patch Consolidation – Successful edits from different seeds are merged, yielding a multi‑hunk patch that can be applied in one commit.
- Validation – The generated patch is compiled and run against the test suite; only patches that make all tests pass are kept.
Results & Findings
| Benchmark | SiblingRepair | Hercules (SOTA) | Other Multi‑hunk APR |
|---|---|---|---|
| Defects4J | 41 % of bugs fixed | 28 % | ≤22 % |
| GHRB | 38 % | 24 % | ≤19 % |
- Repair Efficiency: Average wall‑clock time per bug ≈ 3.2 min, comparable to Hercules despite the extra sibling search step.
- Sibling Detection Accuracy: Over 85 % of the retrieved siblings were truly related to the failure, confirming the effectiveness of token + embedding filtering.
- Leakage Check: Removing any test‑case‑specific code from the LLM’s context reduced success by < 2 %, indicating minimal reliance on memorized training data.
Overall, SiblingRepair lifts the ceiling for automated multi‑hunk fixing, especially in projects where similar logic is duplicated across modules.
Practical Implications
- Faster Bug Triage – Developers can run SiblingRepair as part of CI pipelines; when a test fails, the tool can propose a single patch that resolves all occurrences, cutting down manual copy‑paste errors.
- Consistent Refactoring – Because siblings are identified semantically, the approach works even after code has been refactored, making it suitable for large, evolving codebases.
- Reduced Technical Debt – Automated multi‑hunk fixes help eliminate hidden duplicated bugs that often linger after a single‑location repair.
- LLM Integration Blueprint – The two‑stage (filter + generate) workflow offers a reusable pattern for other developer‑assist tools (e.g., automated code review, security‑rule enforcement).
Limitations & Future Work
- Dependence on LLM Quality – The current implementation relies on a proprietary LLM; performance may vary with open‑source alternatives.
- Scalability of Embedding Search – For very large repositories, the sibling retrieval step can become a bottleneck; indexing optimizations are needed.
- Limited to Test‑Driven Bugs – Spectrum‑based localization still requires a failing test, so bugs without test coverage remain out of scope.
- Future Directions: (1) Incorporate static analysis to broaden fault localization, (2) explore few‑shot prompting to reduce LLM inference cost, and (3) extend the approach to cross‑language sibling detection (e.g., Java ↔ Kotlin).
Authors
- Xinyu Liu
- Jiayu Ren
- Yusen Wang
- Qi Xin
- Xiaoyuan Xie
- Jifeng Xuan
Paper Information
- arXiv ID: 2605.06209v1
- Categories: cs.SE
- Published: May 7, 2026
- PDF: Download PDF