[Paper] LEFT-RS: A Lock-Free Fault-Tolerant Resource Sharing Protocol for Multicore Real-Time Systems

Published: 1 month ago (December 25, 2025 at 09:52 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.21701v1

Overview

The paper introduces LEFT‑RS, a lock‑free, fault‑tolerant protocol that lets multiple real‑time tasks on multicore embedded systems share resources without the traditional blocking caused by locks. By allowing tasks to read shared data in parallel and recover quickly from transient faults, LEFT‑RS dramatically improves both timing predictability and overall system schedulability.

Key Contributions

Lock‑free resource sharing: Eliminates conventional mutexes, enabling concurrent reads of global resources while still guaranteeing exclusive writes.
Integrated fault tolerance: Detects transient faults inside critical sections and lets fault‑free tasks finish early, reducing the cascade of errors across tasks.
Bounded timing analysis: Provides a worst‑case response‑time (WCRT) model that preserves hard real‑time guarantees despite the lock‑free design.
Scalable parallel recovery: Uses lightweight parallel replica execution to recover from faults without the heavy coordination overhead of prior approaches.
Empirical validation: Shows up to 84.5 % improvement in schedulability on average compared with state‑of‑the‑art locking and fault‑tolerant schemes.

Methodology

Parallel Critical Sections – Instead of a single task holding a lock, LEFT‑RS lets every task enter its critical section simultaneously. Reads are performed on a shared snapshot of the resource, while writes are staged locally.
Fault Detection & Early Exit – Each task runs a lightweight checksum on its local copy. If a fault is detected, the task aborts its critical section, discarding its changes. Fault‑free tasks that have already validated their work can commit early, freeing the resource for others.
Commit Protocol – A lightweight, lock‑free commit phase uses atomic compare‑and‑swap (CAS) operations to merge validated writes into the global state. Because only one task can successfully CAS at a time, mutual exclusion is achieved without a traditional lock.
Timing Analysis – The authors extend classic response‑time analysis (RTA) to account for:
- Parallel execution of critical sections,
- Potential aborts due to faults,
- The bounded overhead of the CAS‑based commit.
  This yields a closed‑form WCRT bound that can be plugged into existing real‑time schedulers.
Evaluation Platform – Experiments were run on a set of synthetic task sets and a realistic automotive ECU benchmark, comparing LEFT‑RS against:
- Traditional lock‑based protocols (e.g., MPCP, FMLP),
- Existing fault‑tolerant schemes that rely on sequential replicas.

Results & Findings

Metric	LEFT‑RS	Best Prior Lock‑Based	Prior Fault‑Tolerant (Replica)
Schedulability gain	↑ 84.5 % (avg.)	baseline	↑ 38 %
Average CPU utilization	↓ 12 % (less blocking)	higher due to lock wait	similar to LEFT‑RS but with higher overhead
Fault recovery latency	≤ 1.2 × single‑task exec time	N/A (no recovery)	↑ 2.5 × single‑task exec time
Commit overhead	1–2 CAS ops per critical section	lock acquire/release	multiple synchronization points

Key takeaways

Lock‑free access cuts the worst‑case blocking time dramatically, which directly translates into higher task‑set acceptance.
Early‑exit on fault prevents a single corrupted task from stalling all others, a common problem in traditional lock‑based designs.
The CAS‑based commit adds negligible overhead (just a couple of atomic instructions), making the approach practical on low‑power microcontrollers.

Practical Implications

Automotive & Aerospace – Safety‑critical ECUs can now run tighter control loops on multicore silicon without sacrificing determinism, even when transient electromagnetic interference is expected.
Industrial IoT – Edge devices that share sensor buffers or actuators can maintain high throughput while still meeting hard deadlines, reducing the need for over‑provisioned cores.
OS & Runtime Designers – LEFT‑RS can be integrated as a library or kernel extension, offering a drop‑in replacement for mutexes in real‑time POSIX‑like APIs (e.g., pthread_mutex).
Developer Tooling – The WCRT analysis is compatible with existing schedulability analysis tools (e.g., Cheddar, RTSS), allowing engineers to evaluate the impact of switching to LEFT‑RS without rewriting models.

In short, LEFT‑RS gives developers a way to keep the cores busy (higher utilization) while still guaranteeing that critical sections complete on time, even in the presence of transient faults.

Limitations & Future Work

Fault Model – The protocol assumes transient faults that can be detected via checksums; permanent hardware failures still require higher‑level redundancy.
Resource Types – LEFT‑RS focuses on read‑mostly shared data with occasional writes; heavily write‑contended resources may still suffer from commit contention.
Hardware Support – The analysis presumes atomic CAS is available and fast; on some ultra‑low‑power cores without native CAS, a software fallback could increase overhead.
Scalability Beyond 8‑Core – Experiments capped at 8 cores; the authors plan to explore hierarchical commit schemes for many‑core systems.

Future research directions include extending the protocol to mixed‑criticality systems, integrating hardware error‑detecting codes for more robust fault detection, and evaluating LEFT‑RS on heterogeneous platforms (e.g., CPU‑GPU combos) where resource sharing spans different execution units.

Authors

Nan Chen
Xiaotian Dai
Tong Cheng
Alan Burns
Iain Bate
Shuai Zhao

Paper Information

arXiv ID: 2512.21701v1
Categories: cs.OS, cs.DC
Published: December 25, 2025
PDF: Download PDF

[Paper] LEFT-RS: A Lock-Free Fault-Tolerant Resource Sharing Protocol for Multicore Real-Time Systems

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Proceedings First Workshop on Adaptable Cloud Architectures

[Paper] FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion

[Paper] Robust Federated Fine-Tuning in Heterogeneous Networks with Unreliable Connections: An Aggregation View

[Paper] BLEST: Blazingly Efficient BFS using Tensor Cores