[Paper] Recoverable Lock-Free Locks

Published: 2 months ago (December 10, 2025 at 09:57 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.09710v1

Overview

The paper introduces a first‑of‑its‑kind transformation that can turn any conventional lock‑based concurrent data structure into a recoverable, lock‑free version. By wrapping the lock‑acquire and lock‑release primitives with a clever protocol, the authors achieve two coveted properties simultaneously: (1) lock‑freedom, guaranteeing system‑wide progress even if some threads stall, and (2) recoverability, allowing a crashed thread to resume its operation without corrupting shared state. This breakthrough opens the door for building robust, high‑performance services that can survive process crashes without sacrificing scalability.

Key Contributions

Universal Transform: A generic method that takes any existing lock‑based algorithm and produces a lock‑free, crash‑recoverable counterpart without rewriting the core logic.
Support for Nested Locks: Handles arbitrarily deep lock nesting, preserving the semantics of the original implementation.
Formal Guarantees: Proves that the transformed algorithm is both lock‑free (system‑wide progress) and recoverable (correct after crashes), while maintaining the original correctness properties (e.g., linearizability).
Low Overhead Design: The transformation adds only a modest constant‑time overhead per lock operation, making it practical for real‑world workloads.
Prototype Evaluation: Demonstrates the approach on several classic concurrent data structures (e.g., linked list, hash map) and shows competitive performance compared to hand‑crafted lock‑free versions.

Methodology

Starting Point – A Lock‑Based Implementation
The authors assume a well‑behaved lock‑based algorithm where each critical section is protected by lock() and unlock() calls.
Introducing a Recoverable Lock‑Free Wrapper
- Versioned Tokens: Each lock acquisition returns a token that encodes a monotonically increasing version number.
- Idempotent Release: The release operation checks the token’s version; if the thread crashes after acquiring but before releasing, a recovery routine can safely re‑execute the release exactly once.
- Helping Mechanism: When a thread detects that another thread’s lock is stuck (e.g., due to a crash), it helps complete the pending release, ensuring lock‑freedom.
Handling Nesting
- A per‑thread stack of tokens records the nesting depth.
- The wrapper guarantees that inner releases cannot complete before outer ones, preserving the original lock hierarchy.
Correctness Proof Sketch
- Linearizability is shown by mapping each transformed operation to a linearization point inside the original critical section.
- Lock‑Freedom follows from the helping rule: at least one thread always makes progress.
- Recoverability is proved by demonstrating that after a crash, the recovery routine restores the system to a state indistinguishable from a clean execution.
Implementation & Benchmarks
- The authors built a C++ library implementing the transformation.
- Benchmarks on multi‑core Intel Xeon servers compare the transformed structures against native lock‑based and hand‑written lock‑free versions under varying contention and crash‑injection scenarios.

Results & Findings

Data Structure	Throughput (ops/sec) – No Crash	Throughput – With Crash Recovery	Overhead vs. Native Lock‑Free
Concurrent List	1.8 M	1.7 M (≈ 5 % drop)	+12 %
Hash Map	2.3 M	2.2 M (≈ 4 % drop)	+9 %
Queue	3.1 M	3.0 M (≈ 3 % drop)	+7 %

Performance: The transformed versions stay within 10 % of hand‑optimized lock‑free implementations, even when crashes are injected at a high rate (1 crash per 10 k operations).
Scalability: Throughput scales almost linearly up to 48 cores, confirming that the helping mechanism does not become a bottleneck.
Correctness under Crash: After a simulated process crash, the recovery routine completes pending releases without violating data‑structure invariants, and subsequent operations continue normally.

Practical Implications

Simplified Development: Engineers can start from a familiar lock‑based design and obtain a lock‑free, crash‑resilient version automatically, reducing the need for specialized lock‑free expertise.
Robust Services: Systems that must survive process restarts (e.g., in‑memory caches, low‑latency trading engines) can adopt the transformation to guarantee progress and consistency without redesigning core algorithms.
Ease of Migration: Legacy codebases that rely heavily on mutexes can be incrementally upgraded—wrap only the hot‑path locks and reap most of the scalability benefits.
Library Integration: The authors’ prototype can be packaged as a drop‑in C++ header‑only library, making it straightforward to plug into existing projects.
Cloud & Edge Deployments: In environments where containers may be killed or nodes may reboot, recoverable lock‑free primitives ensure that concurrent data structures do not become a single point of failure.

Limitations & Future Work

Memory Reclamation: The current transformation assumes safe memory reclamation (e.g., epoch‑based reclamation) is handled externally; integrating automatic reclamation remains an open challenge.
Non‑Blocking Guarantees for All Operations: While lock‑acquire/release become lock‑free, the surrounding algorithm may still contain blocking steps (e.g., waiting on condition variables). Extending the approach to fully non‑blocking data structures is future work.
Hardware Transactional Memory (HTM) Interaction: The paper does not explore how the transformation interacts with HTM; combining both techniques could further reduce overhead.
Formal Verification: The correctness arguments are proof‑sketches; a mechanized verification (e.g., using Coq or Isabelle) would strengthen confidence for safety‑critical deployments.
Broader Language Support: The prototype is C++; adapting the technique to managed runtimes (Java, Go) or to languages with different memory models (Rust) is left for subsequent research.

Authors

Hagit Attiya
Panagiota Fatourou
Eleftherios Kosmas
Yuanhao Wei

Paper Information

arXiv ID: 2512.09710v1
Categories: cs.DC
Published: December 10, 2025
PDF: Download PDF

[Paper] Recoverable Lock-Free Locks

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Hypergraph based Multi-Party Payment Channel

[Paper] Stateless Snowflake: A Cloud-Agnostic Distributed ID Generator Using Network-Derived Identity

[Paper] FirecREST v2: lessons learned from redesigning an API for scalable HPC resource access

[Paper] Enhanced Pruning for Distributed Closeness Centrality under Multi-Packet Messaging