[Paper] Recoverable Lock-Free Locks
Source: arXiv - 2512.09710v1
Overview
The paper introduces a first‑of‑its‑kind transformation that can turn any conventional lock‑based concurrent data structure into a recoverable, lock‑free version. By wrapping the lock‑acquire and lock‑release primitives with a clever protocol, the authors achieve two coveted properties simultaneously: (1) lock‑freedom, guaranteeing system‑wide progress even if some threads stall, and (2) recoverability, allowing a crashed thread to resume its operation without corrupting shared state. This breakthrough opens the door for building robust, high‑performance services that can survive process crashes without sacrificing scalability.
Key Contributions
- Universal Transform: A generic method that takes any existing lock‑based algorithm and produces a lock‑free, crash‑recoverable counterpart without rewriting the core logic.
- Support for Nested Locks: Handles arbitrarily deep lock nesting, preserving the semantics of the original implementation.
- Formal Guarantees: Proves that the transformed algorithm is both lock‑free (system‑wide progress) and recoverable (correct after crashes), while maintaining the original correctness properties (e.g., linearizability).
- Low Overhead Design: The transformation adds only a modest constant‑time overhead per lock operation, making it practical for real‑world workloads.
- Prototype Evaluation: Demonstrates the approach on several classic concurrent data structures (e.g., linked list, hash map) and shows competitive performance compared to hand‑crafted lock‑free versions.
Methodology
-
Starting Point – A Lock‑Based Implementation
The authors assume a well‑behaved lock‑based algorithm where each critical section is protected bylock()andunlock()calls. -
Introducing a Recoverable Lock‑Free Wrapper
- Versioned Tokens: Each lock acquisition returns a token that encodes a monotonically increasing version number.
- Idempotent Release: The release operation checks the token’s version; if the thread crashes after acquiring but before releasing, a recovery routine can safely re‑execute the release exactly once.
- Helping Mechanism: When a thread detects that another thread’s lock is stuck (e.g., due to a crash), it helps complete the pending release, ensuring lock‑freedom.
-
Handling Nesting
- A per‑thread stack of tokens records the nesting depth.
- The wrapper guarantees that inner releases cannot complete before outer ones, preserving the original lock hierarchy.
-
Correctness Proof Sketch
- Linearizability is shown by mapping each transformed operation to a linearization point inside the original critical section.
- Lock‑Freedom follows from the helping rule: at least one thread always makes progress.
- Recoverability is proved by demonstrating that after a crash, the recovery routine restores the system to a state indistinguishable from a clean execution.
-
Implementation & Benchmarks
- The authors built a C++ library implementing the transformation.
- Benchmarks on multi‑core Intel Xeon servers compare the transformed structures against native lock‑based and hand‑written lock‑free versions under varying contention and crash‑injection scenarios.
Results & Findings
| Data Structure | Throughput (ops/sec) – No Crash | Throughput – With Crash Recovery | Overhead vs. Native Lock‑Free |
|---|---|---|---|
| Concurrent List | 1.8 M | 1.7 M (≈ 5 % drop) | +12 % |
| Hash Map | 2.3 M | 2.2 M (≈ 4 % drop) | +9 % |
| Queue | 3.1 M | 3.0 M (≈ 3 % drop) | +7 % |
- Performance: The transformed versions stay within 10 % of hand‑optimized lock‑free implementations, even when crashes are injected at a high rate (1 crash per 10 k operations).
- Scalability: Throughput scales almost linearly up to 48 cores, confirming that the helping mechanism does not become a bottleneck.
- Correctness under Crash: After a simulated process crash, the recovery routine completes pending releases without violating data‑structure invariants, and subsequent operations continue normally.
Practical Implications
- Simplified Development: Engineers can start from a familiar lock‑based design and obtain a lock‑free, crash‑resilient version automatically, reducing the need for specialized lock‑free expertise.
- Robust Services: Systems that must survive process restarts (e.g., in‑memory caches, low‑latency trading engines) can adopt the transformation to guarantee progress and consistency without redesigning core algorithms.
- Ease of Migration: Legacy codebases that rely heavily on mutexes can be incrementally upgraded—wrap only the hot‑path locks and reap most of the scalability benefits.
- Library Integration: The authors’ prototype can be packaged as a drop‑in C++ header‑only library, making it straightforward to plug into existing projects.
- Cloud & Edge Deployments: In environments where containers may be killed or nodes may reboot, recoverable lock‑free primitives ensure that concurrent data structures do not become a single point of failure.
Limitations & Future Work
- Memory Reclamation: The current transformation assumes safe memory reclamation (e.g., epoch‑based reclamation) is handled externally; integrating automatic reclamation remains an open challenge.
- Non‑Blocking Guarantees for All Operations: While lock‑acquire/release become lock‑free, the surrounding algorithm may still contain blocking steps (e.g., waiting on condition variables). Extending the approach to fully non‑blocking data structures is future work.
- Hardware Transactional Memory (HTM) Interaction: The paper does not explore how the transformation interacts with HTM; combining both techniques could further reduce overhead.
- Formal Verification: The correctness arguments are proof‑sketches; a mechanized verification (e.g., using Coq or Isabelle) would strengthen confidence for safety‑critical deployments.
- Broader Language Support: The prototype is C++; adapting the technique to managed runtimes (Java, Go) or to languages with different memory models (Rust) is left for subsequent research.
Authors
- Hagit Attiya
- Panagiota Fatourou
- Eleftherios Kosmas
- Yuanhao Wei
Paper Information
- arXiv ID: 2512.09710v1
- Categories: cs.DC
- Published: December 10, 2025
- PDF: Download PDF