[Paper] SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models

Published: (May 7, 2026 at 09:14 AM EDT)
4 min read
Source: arXiv

Source: arXiv - 2605.06209v1

Overview

The paper introduces SiblingRepair, a new automated program repair (APR) technique that leverages large language models (LLMs) to fix multiple related bugs (multi‑hunk bugs) across a codebase. By detecting “sibling” code fragments—pieces that implement similar functionality and therefore tend to share the same defect—the system can generate consistent patches in a single pass, dramatically improving repair success over prior state‑of‑the‑art tools like Hercules.

Key Contributions

  • LLM‑driven sibling discovery – Uses token‑level and embedding‑based similarity instead of rigid AST matching or commit‑history heuristics.
  • Two complementary repair strategiesSimultaneous repair (jointly patches all siblings) and Iterative repair (progressively refines patches as more siblings are examined).
  • Patch generalization across locations – Retains promising patches from earlier suspicious spots and merges them into a unified multi‑hunk fix.
  • Empirical superiority – Outperforms existing multi‑hunk APR tools on the Defects4J and GHRB benchmarks, with higher repair rates and comparable runtime.
  • Robustness to data leakage – Shows that the LLM’s training data does not materially inflate the reported repair success.

Methodology

  1. Fault Localization – Starts with a conventional spectrum‑based technique to flag a suspicious line (the seed).
  2. Sibling Candidate Retrieval
    • Token‑based matching: Finds code snippets that share a high proportion of lexical tokens with the seed.
    • Embedding‑based matching: Uses a code‑embedding model (e.g., CodeBERT) to capture semantic similarity, surfacing siblings that may look different syntactically but behave alike.
  3. LLM‑guided Filtering – A large language model (e.g., GPT‑4) evaluates each candidate, discarding those unlikely to be related to the failure.
  4. Patch Generation
    • Simultaneous Repair: The LLM receives the seed and all filtered siblings together and is asked to produce a single consistent edit that applies to every location.
    • Iterative Repair: The LLM processes siblings one by one, updating a shared “patch context” so later edits stay compatible with earlier ones.
  5. Patch Consolidation – Successful edits from different seeds are merged, yielding a multi‑hunk patch that can be applied in one commit.
  6. Validation – The generated patch is compiled and run against the test suite; only patches that make all tests pass are kept.

Results & Findings

BenchmarkSiblingRepairHercules (SOTA)Other Multi‑hunk APR
Defects4J41 % of bugs fixed28 %≤22 %
GHRB38 %24 %≤19 %
  • Repair Efficiency: Average wall‑clock time per bug ≈ 3.2 min, comparable to Hercules despite the extra sibling search step.
  • Sibling Detection Accuracy: Over 85 % of the retrieved siblings were truly related to the failure, confirming the effectiveness of token + embedding filtering.
  • Leakage Check: Removing any test‑case‑specific code from the LLM’s context reduced success by < 2 %, indicating minimal reliance on memorized training data.

Overall, SiblingRepair lifts the ceiling for automated multi‑hunk fixing, especially in projects where similar logic is duplicated across modules.

Practical Implications

  • Faster Bug Triage – Developers can run SiblingRepair as part of CI pipelines; when a test fails, the tool can propose a single patch that resolves all occurrences, cutting down manual copy‑paste errors.
  • Consistent Refactoring – Because siblings are identified semantically, the approach works even after code has been refactored, making it suitable for large, evolving codebases.
  • Reduced Technical Debt – Automated multi‑hunk fixes help eliminate hidden duplicated bugs that often linger after a single‑location repair.
  • LLM Integration Blueprint – The two‑stage (filter + generate) workflow offers a reusable pattern for other developer‑assist tools (e.g., automated code review, security‑rule enforcement).

Limitations & Future Work

  • Dependence on LLM Quality – The current implementation relies on a proprietary LLM; performance may vary with open‑source alternatives.
  • Scalability of Embedding Search – For very large repositories, the sibling retrieval step can become a bottleneck; indexing optimizations are needed.
  • Limited to Test‑Driven Bugs – Spectrum‑based localization still requires a failing test, so bugs without test coverage remain out of scope.
  • Future Directions: (1) Incorporate static analysis to broaden fault localization, (2) explore few‑shot prompting to reduce LLM inference cost, and (3) extend the approach to cross‑language sibling detection (e.g., Java ↔ Kotlin).

Authors

  • Xinyu Liu
  • Jiayu Ren
  • Yusen Wang
  • Qi Xin
  • Xiaoyuan Xie
  • Jifeng Xuan

Paper Information

  • arXiv ID: 2605.06209v1
  • Categories: cs.SE
  • Published: May 7, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »