Distributed Transactions (2PC, Saga) in System Design
Source: Dev.to
Introduction
In the complex landscape of modern distributed systems, maintaining data consistency across multiple independent services and databases presents one of the most challenging problems in system design. Distributed transactions provide the foundation for ensuring that operations spanning several resources either succeed completely or fail entirely, preserving the ACID properties of atomicity, consistency, isolation, and durability. This article explores two primary approaches to handling distributed transactions: the Two‑Phase Commit protocol, commonly known as 2PC, and the Saga pattern. Each method addresses the coordination of long‑running transactions in environments where traditional single‑database transactions fall short.
Distributed Transactions Overview
A distributed transaction involves multiple participating resources, such as separate databases, microservices, or external systems, that must coordinate to achieve a unified outcome. Unlike local transactions confined to a single resource, distributed transactions must manage cross‑service consistency while dealing with network latency, partial failures, and independent scaling of components.
The core requirement remains the same as in monolithic systems: the entire operation must appear atomic to the end user. If any part fails, all changes must be undone. However, achieving this in a distributed environment introduces significant complexity because each participant operates autonomously, and communication occurs over unreliable networks. System designers must therefore select protocols that balance strong consistency with availability and performance.
Two‑Phase Commit (2PC)
The Two‑Phase Commit protocol, or 2PC, stands as the classic solution for achieving strong consistency in distributed transactions. Introduced in the 1970s, 2PC relies on a central coordinator and multiple participants to ensure all‑or‑nothing semantics across heterogeneous resources.
Key Components
- Coordinator – The central authority responsible for driving the transaction. It receives the initial transaction request and manages the voting and decision process.
- Participants – The individual resources (databases or services) that perform local work and respond to the coordinator’s instructions.
- Transaction Manager – Often implemented using standards such as XA (eXtended Architecture) for database interactions.
Protocol Phases
Prepare Phase (Voting Phase)
- The coordinator sends a prepare message to all participants.
- Each participant performs the necessary local operations, acquires locks, writes changes to a durable log, and responds with either ready (vote yes) or abort (vote no).
- If any participant votes no or fails to respond, the coordinator decides to abort.
Commit Phase (Decision Phase)
- If all participants vote ready, the coordinator logs the global commit decision and sends commit messages to every participant.
- Each participant then applies the changes permanently and releases locks.
- If the decision is to abort, the coordinator sends rollback messages, and participants undo their local changes using the prepared log entries.
Example Implementation
class TwoPhaseCommitCoordinator {
List participants;
TransactionLog log;
void beginTransaction(Transaction tx) {
log.write("BEGIN_TX", tx.id);
boolean allReady = true;
// Prepare Phase
for (Participant participant : participants) {
Response response = participant.prepare(tx);
if (!response.isReady()) {
allReady = false;
break;
}
}
// Decision
if (allReady) {
log.write("GLOBAL_COMMIT", tx.id);
for (Participant participant : participants) {
participant.commit(tx);
}
} else {
log.write("GLOBAL_ABORT", tx.id);
for (Participant participant : participants) {
participant.rollback(tx);
}
}
}
}class DatabaseParticipant implements Participant {
LocalDatabase db;
UndoLog undoLog;
Response prepare(Transaction tx) {
try {
db.acquireLocks(tx.operations);
db.executeOperations(tx.operations); // tentative changes
undoLog.recordUndoInfo(tx);
return new Response(true, "READY");
} catch (Exception e) {
return new Response(false, "ABORT");
}
}
void commit(Transaction tx) {
db.makeChangesPermanent(tx);
db.releaseLocks(tx);
undoLog.clear(tx);
}
void rollback(Transaction tx) {
db.applyUndo(undoLog.getUndoInfo(tx));
db.releaseLocks(tx);
undoLog.clear(tx);
}
}These code structures illustrate the blocking nature of 2PC: participants hold locks from the prepare phase until the final decision arrives. The coordinator must persist its decision durably before proceeding, ensuring recoverability after crashes.
Drawbacks of 2PC
- Single point of failure – The coordinator is a bottleneck and a potential outage source.
- Blocking protocol – If the coordinator fails after the prepare phase, participants remain locked indefinitely until recovery.
- Network partitions – Can cause prolonged unavailability.
- Performance impact – The need for synchronous coordination and lock holding makes 2PC impractical for high‑throughput, long‑lived microservice operations.
Saga Pattern
The Saga pattern offers a fundamentally different approach to distributed transactions by embracing eventual consistency instead of immediate strong consistency. Originally described in the 1980s for handling long‑lived transactions, a Saga decomposes a large distributed transaction into a sequence of smaller, local transactions. Each local transaction has an associated compensating transaction that undoes its effects if later steps fail.
- Local Transactions – Each service executes its part independently and commits immediately.
- Compensating Transactions – Reversible operations that restore the
Saga vs. Two‑Phase Commit (2PC)
Key Characteristics
| Aspect | Two‑Phase Commit (2PC) | Saga |
|---|---|---|
| Locking | Global locks held until the transaction finishes | No global lock; resources stay available |
| Consistency | Immediate, strong consistency (all‑or‑nothing) | Eventual consistency – the system converges over time |
| Failure handling | Global abort if any participant cannot commit | Compensating (undo) actions for each failed step |
| Scalability | Limited by coordination overhead and lock contention | Highly scalable; each service manages its own ACID transaction |
| Typical use‑case | Short‑lived, critical financial operations | Long‑running business processes across many services |
Saga Styles
1. Choreography‑Based Saga
- Services communicate directly via events.
- Each service listens for events from the previous step and publishes its own events when it finishes (or fails).
- No central controller → loose coupling, but tracing can become complex as the number of services grows.
2. Orchestration‑Based Saga
- A central Saga Orchestrator drives the flow by sending commands to services and reacting to their responses/events.
- The orchestrator maintains the overall state, decides the next step, and triggers compensations when needed.
- Provides clearer visibility and simpler error handling.
Example: Online Store Order Workflow
Scenario: Placing an order involves three services:
- Order Service
- Payment Service
- Inventory Service
The saga guarantees that:
- If payment fails → inventory is not deducted.
- If inventory is unavailable → payment is refunded.
Orchestrator Implementation (Java‑like pseudocode)
class OrderSagaOrchestrator {
OrderService orderService;
PaymentService paymentService;
InventoryService inventoryService;
SagaStateRepository stateRepo;
void startOrderSaga(OrderRequest request) {
SagaInstance saga = new SagaInstance(request.orderId);
stateRepo.save(saga);
// Step 1: Create Order (local transaction)
Order order = orderService.createOrder(request);
saga.updateStep("ORDER_CREATED", order);
try {
// Step 2: Process Payment
Payment payment = paymentService.processPayment(order);
saga.updateStep("PAYMENT_SUCCESS", payment);
// Step 3: Reserve Inventory
InventoryReservation reservation = inventoryService.reserveInventory(order);
saga.updateStep("INVENTORY_RESERVED", reservation);
saga.complete(); // saga finished successfully
return;
} catch (PaymentFailedException e) {
// Compensation: Cancel Order
orderService.cancelOrder(order);
saga.fail("PAYMENT_FAILED");
} catch (InventoryUnavailableException e) {
// Compensation chain
paymentService.refundPayment(payment);
orderService.cancelOrder(order);
saga.fail("INVENTORY_FAILED");
}
}
// ---- Compensating transaction examples ----
void compensatePayment(Payment payment) {
paymentService.refundPayment(payment); // idempotent refund
}
void compensateOrder(Order order) {
orderService.cancelOrder(order); // releases any reservations
}
}Inventory Service (Local Transaction + Compensation)
class InventoryService {
InventoryRepository repo;
// Local transaction – fully committed immediately
InventoryReservation reserveInventory(Order order) {
return repo.withinTransaction(() -> {
Stock stock = repo.findStock(order.productId);
if (stock.quantity {
Stock stock = repo.findStock(reservation.productId);
stock.quantity += reservation.quantity;
repo.save(stock);
});
}
}Note: Every command and compensation should carry an idempotency key to safely retry after network failures.
When to Use Which Pattern?
| Situation | Preferred Approach |
|---|---|
| Strong, immediate consistency required (e.g., banking, ledger updates) | 2PC |
| Long‑running, multi‑service business processes where temporary inconsistency is acceptable | Saga |
| Hybrid: critical synchronous steps inside a bounded context + cross‑context orchestration | Combine 2PC (for the critical part) + Saga (for the rest) |
Why Saga Is Popular in Microservices
- No long‑held locks → higher availability.
- Horizontal scalability – each service only handles its own ACID transaction.
- Targeted compensations instead of a global abort.
- Fits naturally with event‑driven architectures and message brokers (Kafka, RabbitMQ) for reliable delivery.
Proper Saga implementations demand:
- Thoughtful design of compensating transactions.
- Robust idempotency handling.
- Comprehensive monitoring of saga instances to detect and resolve stuck workflows.
Further Reading
System Design Handbook – a deep dive into distributed systems, transaction patterns, and more.
- 📚 Purchase the handbook:
- ☕ Support the author: