[Paper] Compiling Away the Overhead of Race Detection

Published: (December 5, 2025 at 04:26 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.05555v1

Overview

Dynamic data‑race detectors such as ThreadSanitizer are essential for catching hard‑to‑reproduce concurrency bugs, but their usefulness is often hampered by the ~2–5× runtime slowdown they introduce. The paper Compiling Away the Overhead of Race Detection shows how a compiler‑level static analysis can automatically prune the vast majority of unnecessary instrumentation, cutting the overhead by up to 2.5× while keeping the detector’s soundness intact.

Key Contributions

  • Interprocedural static analyses that prove many memory accesses are provably race‑free and therefore do not need runtime checks.
  • Equivalence‑class based redundancy elimination: if two checks would report the same race (possibly at different program points), only one representative is kept.
  • Dominance‑based elimination algorithm that identifies the redundant checks efficiently.
  • LLVM implementation integrated with ThreadSanitizer’s instrumentation pass, requiring no code changes from developers.
  • Empirical evaluation on a broad set of real‑world applications showing a geometric‑mean speedup of 1.34× (peak 2.5×) with negligible compile‑time impact.

Methodology

  1. Static Race‑Freedom Analysis – The compiler walks the whole program (including across function boundaries) and reasons about:

    • which memory locations are accessed,
    • the synchronization primitives protecting those accesses,
    • thread‑creation patterns.
      If it can prove that an access can never race, the corresponding instrumentation is removed.
  2. Equivalence‑Class Detection – The authors observe that many inserted checks are semantically duplicate: they would fire for the same underlying data race, just from a different instruction. By defining an equivalence relation over accesses (same memory location, same lock set, etc.), they can keep a single “representative” check per class.

  3. Dominance‑Based Elimination – Using classic control‑flow dominance information, the analysis discards any check that is dominated by another check already representing its equivalence class. This step is cheap and fits naturally into the LLVM pass pipeline.

  4. Preserving Completeness – The analyses are designed so that if a race is possible, at least one instrumentation point will still be present, guaranteeing that ThreadSanitizer’s detection capabilities are unchanged.

Results & Findings

BenchmarkOriginal TSAN slowdownAfter optimizationSpeedup
libpng (high contention)3.8×2.5×1.52×
SQLite2.1×1.6×1.31×
LLVM (self‑compile)2.9×2.2×1.32×
Geometric mean1.34×
  • Compilation overhead grew by < 2 % on average, well within typical developer tolerances.
  • The approach works automatically: no annotations, configuration flags, or source‑code modifications are required.
  • The optimizations have been accepted by the ThreadSanitizer maintainers and are slated for upstream inclusion, indicating production‑readiness.

Practical Implications

  • Faster CI pipelines – Teams that already run ThreadSanitizer in continuous integration can expect noticeably shorter test runs, especially for highly parallel workloads.
  • Lower barrier to adoption – The reduced slowdown makes it feasible to enable race detection in performance‑critical builds (e.g., release candidates) where developers previously disabled it.
  • Better resource utilization – Shorter runtimes translate to lower CPU costs on cloud‑based testing farms, a tangible cost saving for large organizations.
  • Potential for other detectors – The same static‑analysis ideas could be ported to other dynamic checks (e.g., memory‑error detectors, undefined‑behavior sanitizers), opening a broader avenue for compiler‑driven instrumentation reduction.

Limitations & Future Work

  • The analyses are conservative: they only remove instrumentation when they can prove race‑freedom, so some redundant checks inevitably remain.
  • Complex synchronization patterns (e.g., custom lock implementations, lock‑free data structures) may evade the current static reasoning and thus keep instrumentation.
  • The current work focuses on ThreadSanitizer; extending the technique to other languages or runtimes (e.g., Java, Go) will require additional language‑specific modeling.
  • Future research directions include hybrid static‑dynamic approaches that refine the analysis at runtime, and machine‑learning‑guided heuristics to predict which checks are most likely redundant without full formal proof.

Authors

  • Alexey Paznikov
  • Andrey Kogutenko
  • Yaroslav Osipov
  • Michael Schwarz
  • Umang Mathur

Paper Information

  • arXiv ID: 2512.05555v1
  • Categories: cs.PL, cs.OS, cs.SE
  • Published: December 5, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »