[Paper] Using a Sledgehammer to Crack a Nut? Revisiting Automated Compiler Fault Isolation

Published: (December 18, 2025 at 04:22 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.16335v1

Overview

Compilers are the invisible workhorses that turn our source code into running programs, so a bug in a compiler can break entire toolchains. This paper pits the everyday “look‑at‑the‑commit‑history” approach that developers already use against a suite of sophisticated spectrum‑based fault‑localization (SBFL) techniques, asking which actually helps you find the offending change faster.

Key Contributions

  • Empirical head‑to‑head comparison of a simple BIC‑based strategy (named Basic) with several state‑of‑the‑art SBFL methods on a large benchmark (60 GCC + 60 LLVM bugs).
  • Demonstration that Basic is competitive and often superior on Top‑1 and Top‑5 ranking metrics, challenging the assumption that complex SBFL always wins.
  • Recommendation of Basic as a baseline for future compiler‑fault‑isolation research, providing a low‑overhead, reproducible reference point.
  • Open‑source benchmark and tooling (scripts for binary‑search commit isolation and SBFL pipelines) released to the community for repeatable experiments.

Methodology

  1. Bug selection – The authors curated 120 real compiler bugs (60 from GCC, 60 from LLVM) that have publicly available test cases and version histories.
  2. Basic strategy
    • Identify the latest good release (tests pass) and the earliest bad release (tests fail).
    • Perform a binary search on the commit timeline to locate the bug‑inducing commit (BIC).
    • Flag every file touched in that commit as a potential fault location.
  3. SBFL techniques – Implemented several popular SBFL algorithms (e.g., Ochiai, Tarantula, DStar) that use test execution spectra (pass/fail) to rank source files.
  4. Evaluation metrics – Measured how often the true faulty file appears in the top‑1, top‑5, and top‑10 positions of each technique’s ranked list.
  5. Statistical analysis – Used Wilcoxon signed‑rank tests and effect‑size calculations to assess significance of differences.

Results & Findings

MetricBasic (BIC)Best SBFL (e.g., Ochiai)
Top‑1 accuracy38 %34 %
Top‑5 accuracy71 %66 %
Top‑10 accuracy78 %77 %
  • Basic outperforms SBFL on the most critical Top‑1 and Top‑5 rankings, meaning developers are more likely to see the right file immediately.
  • The advantage is especially pronounced for LLVM bugs, where commit granularity tends to be tighter.
  • Statistical tests confirm that the differences are not due to random chance (p < 0.05).
  • Runtime: Basic requires only a handful of builds (log₂ N for N commits) and no instrumentation, whereas SBFL needs full test‑suite execution for every program version.

Practical Implications

  • Faster debugging cycles – Teams can adopt the binary‑search BIC approach as a first‑line tool, cutting the number of builds dramatically compared with exhaustive SBFL runs.
  • Lower infrastructure cost – No need for heavy instrumentation or test‑coverage collection; a simple CI pipeline that tags good/bad releases suffices.
  • Integration with existing workflows – Most version‑control systems already support bisect commands; the paper essentially formalizes that practice for compiler bugs.
  • Prioritization of effort – When Basic flags a small set of files (often just one or two), developers can focus code review and static‑analysis resources there, reserving SBFL for the “hard cases” where commit history is noisy.
  • Tooling – The released scripts can be dropped into CI pipelines for GCC/LLVM projects, or adapted to any language compiler that has a reproducible test suite.

Limitations & Future Work

  • Dependence on reliable test suites – Both Basic and SBFL assume a clear pass/fail signal; flaky tests could mislead the binary search.
  • Granularity of commits – In projects with large, monolithic commits, Basic may flag many files, diluting its advantage.
  • Scope limited to GCC/LLVM – While these are major compilers, results may differ for domain‑specific or less‑maintained compilers.
  • Future directions suggested include hybrid approaches (using commit history to narrow the search space, then applying SBFL within that subset) and extending the benchmark to other compiler families and to just‑in‑time (JIT) compilation environments.

Authors

  • Yibiao Yang
  • Qingyang Li
  • Maolin Sun
  • Jiangchang Wu
  • Yuming Zhou

Paper Information

  • arXiv ID: 2512.16335v1
  • Categories: cs.SE
  • Published: December 18, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »