[Paper] SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models

Published: 4 days ago (May 7, 2026 at 09:14 AM EDT)

4 min read

Source: arXiv

Source: arXiv - 2605.06209v1

Overview

The paper introduces SiblingRepair, a new automated program repair (APR) technique that leverages large language models (LLMs) to fix multiple related bugs (multi‑hunk bugs) across a codebase. By detecting “sibling” code fragments—pieces that implement similar functionality and therefore tend to share the same defect—the system can generate consistent patches in a single pass, dramatically improving repair success over prior state‑of‑the‑art tools like Hercules.

Key Contributions

LLM‑driven sibling discovery – Uses token‑level and embedding‑based similarity instead of rigid AST matching or commit‑history heuristics.
Two complementary repair strategies – Simultaneous repair (jointly patches all siblings) and Iterative repair (progressively refines patches as more siblings are examined).
Patch generalization across locations – Retains promising patches from earlier suspicious spots and merges them into a unified multi‑hunk fix.
Empirical superiority – Outperforms existing multi‑hunk APR tools on the Defects4J and GHRB benchmarks, with higher repair rates and comparable runtime.
Robustness to data leakage – Shows that the LLM’s training data does not materially inflate the reported repair success.

Methodology

Fault Localization – Starts with a conventional spectrum‑based technique to flag a suspicious line (the seed).
Sibling Candidate Retrieval
- Token‑based matching: Finds code snippets that share a high proportion of lexical tokens with the seed.
- Embedding‑based matching: Uses a code‑embedding model (e.g., CodeBERT) to capture semantic similarity, surfacing siblings that may look different syntactically but behave alike.
LLM‑guided Filtering – A large language model (e.g., GPT‑4) evaluates each candidate, discarding those unlikely to be related to the failure.
Patch Generation
- Simultaneous Repair: The LLM receives the seed and all filtered siblings together and is asked to produce a single consistent edit that applies to every location.
- Iterative Repair: The LLM processes siblings one by one, updating a shared “patch context” so later edits stay compatible with earlier ones.
Patch Consolidation – Successful edits from different seeds are merged, yielding a multi‑hunk patch that can be applied in one commit.
Validation – The generated patch is compiled and run against the test suite; only patches that make all tests pass are kept.

Results & Findings

Benchmark	SiblingRepair	Hercules (SOTA)	Other Multi‑hunk APR
Defects4J	41 % of bugs fixed	28 %	≤22 %
GHRB	38 %	24 %	≤19 %

Repair Efficiency: Average wall‑clock time per bug ≈ 3.2 min, comparable to Hercules despite the extra sibling search step.
Sibling Detection Accuracy: Over 85 % of the retrieved siblings were truly related to the failure, confirming the effectiveness of token + embedding filtering.
Leakage Check: Removing any test‑case‑specific code from the LLM’s context reduced success by < 2 %, indicating minimal reliance on memorized training data.

Overall, SiblingRepair lifts the ceiling for automated multi‑hunk fixing, especially in projects where similar logic is duplicated across modules.

Practical Implications

Faster Bug Triage – Developers can run SiblingRepair as part of CI pipelines; when a test fails, the tool can propose a single patch that resolves all occurrences, cutting down manual copy‑paste errors.
Consistent Refactoring – Because siblings are identified semantically, the approach works even after code has been refactored, making it suitable for large, evolving codebases.
Reduced Technical Debt – Automated multi‑hunk fixes help eliminate hidden duplicated bugs that often linger after a single‑location repair.
LLM Integration Blueprint – The two‑stage (filter + generate) workflow offers a reusable pattern for other developer‑assist tools (e.g., automated code review, security‑rule enforcement).

Limitations & Future Work

Dependence on LLM Quality – The current implementation relies on a proprietary LLM; performance may vary with open‑source alternatives.
Scalability of Embedding Search – For very large repositories, the sibling retrieval step can become a bottleneck; indexing optimizations are needed.
Limited to Test‑Driven Bugs – Spectrum‑based localization still requires a failing test, so bugs without test coverage remain out of scope.
Future Directions: (1) Incorporate static analysis to broaden fault localization, (2) explore few‑shot prompting to reduce LLM inference cost, and (3) extend the approach to cross‑language sibling detection (e.g., Java ↔ Kotlin).

Authors

Xinyu Liu
Jiayu Ren
Yusen Wang
Qi Xin
Xiaoyuan Xie
Jifeng Xuan

Paper Information

arXiv ID: 2605.06209v1
Categories: cs.SE
Published: May 7, 2026
PDF: Download PDF

[Paper] SiblingRepair: Sibling-Based Multi-Hunk Repair with Large Language Models

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Collaborator or Assistnat? How AI Coding Agents Partition Work Across Pull Request Lifecycles

[Paper] Similar Pattern Annotation via Retrieval Knowledge for LLM-Based Test Code Fault Localization

[Paper] Evaluating Design Conformance Through Trace Comparison

[Paper] Unsafe by Flow: Uncovering Bidirectional Data-Flow Risks in MCP Ecosystem