Distributed Transactions (2PC, Saga) in System Design

Published: (April 3, 2026 at 02:52 AM EDT)
7 min read
Source: Dev.to

Source: Dev.to

Introduction

In the complex landscape of modern distributed systems, maintaining data consistency across multiple independent services and databases presents one of the most challenging problems in system design. Distributed transactions provide the foundation for ensuring that operations spanning several resources either succeed completely or fail entirely, preserving the ACID properties of atomicity, consistency, isolation, and durability. This article explores two primary approaches to handling distributed transactions: the Two‑Phase Commit protocol, commonly known as 2PC, and the Saga pattern. Each method addresses the coordination of long‑running transactions in environments where traditional single‑database transactions fall short.

Distributed Transactions Overview

A distributed transaction involves multiple participating resources, such as separate databases, microservices, or external systems, that must coordinate to achieve a unified outcome. Unlike local transactions confined to a single resource, distributed transactions must manage cross‑service consistency while dealing with network latency, partial failures, and independent scaling of components.

The core requirement remains the same as in monolithic systems: the entire operation must appear atomic to the end user. If any part fails, all changes must be undone. However, achieving this in a distributed environment introduces significant complexity because each participant operates autonomously, and communication occurs over unreliable networks. System designers must therefore select protocols that balance strong consistency with availability and performance.

Two‑Phase Commit (2PC)

The Two‑Phase Commit protocol, or 2PC, stands as the classic solution for achieving strong consistency in distributed transactions. Introduced in the 1970s, 2PC relies on a central coordinator and multiple participants to ensure all‑or‑nothing semantics across heterogeneous resources.

Key Components

  • Coordinator – The central authority responsible for driving the transaction. It receives the initial transaction request and manages the voting and decision process.
  • Participants – The individual resources (databases or services) that perform local work and respond to the coordinator’s instructions.
  • Transaction Manager – Often implemented using standards such as XA (eXtended Architecture) for database interactions.

Protocol Phases

  1. Prepare Phase (Voting Phase)

    • The coordinator sends a prepare message to all participants.
    • Each participant performs the necessary local operations, acquires locks, writes changes to a durable log, and responds with either ready (vote yes) or abort (vote no).
    • If any participant votes no or fails to respond, the coordinator decides to abort.
  2. Commit Phase (Decision Phase)

    • If all participants vote ready, the coordinator logs the global commit decision and sends commit messages to every participant.
    • Each participant then applies the changes permanently and releases locks.
    • If the decision is to abort, the coordinator sends rollback messages, and participants undo their local changes using the prepared log entries.

Example Implementation

class TwoPhaseCommitCoordinator {
    List participants;
    TransactionLog log;

    void beginTransaction(Transaction tx) {
        log.write("BEGIN_TX", tx.id);
        boolean allReady = true;

        // Prepare Phase
        for (Participant participant : participants) {
            Response response = participant.prepare(tx);
            if (!response.isReady()) {
                allReady = false;
                break;
            }
        }

        // Decision
        if (allReady) {
            log.write("GLOBAL_COMMIT", tx.id);
            for (Participant participant : participants) {
                participant.commit(tx);
            }
        } else {
            log.write("GLOBAL_ABORT", tx.id);
            for (Participant participant : participants) {
                participant.rollback(tx);
            }
        }
    }
}
class DatabaseParticipant implements Participant {
    LocalDatabase db;
    UndoLog undoLog;

    Response prepare(Transaction tx) {
        try {
            db.acquireLocks(tx.operations);
            db.executeOperations(tx.operations);  // tentative changes
            undoLog.recordUndoInfo(tx);
            return new Response(true, "READY");
        } catch (Exception e) {
            return new Response(false, "ABORT");
        }
    }

    void commit(Transaction tx) {
        db.makeChangesPermanent(tx);
        db.releaseLocks(tx);
        undoLog.clear(tx);
    }

    void rollback(Transaction tx) {
        db.applyUndo(undoLog.getUndoInfo(tx));
        db.releaseLocks(tx);
        undoLog.clear(tx);
    }
}

These code structures illustrate the blocking nature of 2PC: participants hold locks from the prepare phase until the final decision arrives. The coordinator must persist its decision durably before proceeding, ensuring recoverability after crashes.

Drawbacks of 2PC

  • Single point of failure – The coordinator is a bottleneck and a potential outage source.
  • Blocking protocol – If the coordinator fails after the prepare phase, participants remain locked indefinitely until recovery.
  • Network partitions – Can cause prolonged unavailability.
  • Performance impact – The need for synchronous coordination and lock holding makes 2PC impractical for high‑throughput, long‑lived microservice operations.

Saga Pattern

The Saga pattern offers a fundamentally different approach to distributed transactions by embracing eventual consistency instead of immediate strong consistency. Originally described in the 1980s for handling long‑lived transactions, a Saga decomposes a large distributed transaction into a sequence of smaller, local transactions. Each local transaction has an associated compensating transaction that undoes its effects if later steps fail.

  • Local Transactions – Each service executes its part independently and commits immediately.
  • Compensating Transactions – Reversible operations that restore the

Saga vs. Two‑Phase Commit (2PC)

Key Characteristics

AspectTwo‑Phase Commit (2PC)Saga
LockingGlobal locks held until the transaction finishesNo global lock; resources stay available
ConsistencyImmediate, strong consistency (all‑or‑nothing)Eventual consistency – the system converges over time
Failure handlingGlobal abort if any participant cannot commitCompensating (undo) actions for each failed step
ScalabilityLimited by coordination overhead and lock contentionHighly scalable; each service manages its own ACID transaction
Typical use‑caseShort‑lived, critical financial operationsLong‑running business processes across many services

Saga Styles

1. Choreography‑Based Saga

  • Services communicate directly via events.
  • Each service listens for events from the previous step and publishes its own events when it finishes (or fails).
  • No central controller → loose coupling, but tracing can become complex as the number of services grows.

2. Orchestration‑Based Saga

  • A central Saga Orchestrator drives the flow by sending commands to services and reacting to their responses/events.
  • The orchestrator maintains the overall state, decides the next step, and triggers compensations when needed.
  • Provides clearer visibility and simpler error handling.

Example: Online Store Order Workflow

Scenario: Placing an order involves three services:

  1. Order Service
  2. Payment Service
  3. Inventory Service

The saga guarantees that:

  • If payment fails → inventory is not deducted.
  • If inventory is unavailable → payment is refunded.

Orchestrator Implementation (Java‑like pseudocode)

class OrderSagaOrchestrator {
    OrderService orderService;
    PaymentService paymentService;
    InventoryService inventoryService;
    SagaStateRepository stateRepo;

    void startOrderSaga(OrderRequest request) {
        SagaInstance saga = new SagaInstance(request.orderId);
        stateRepo.save(saga);

        // Step 1: Create Order (local transaction)
        Order order = orderService.createOrder(request);
        saga.updateStep("ORDER_CREATED", order);

        try {
            // Step 2: Process Payment
            Payment payment = paymentService.processPayment(order);
            saga.updateStep("PAYMENT_SUCCESS", payment);

            // Step 3: Reserve Inventory
            InventoryReservation reservation = inventoryService.reserveInventory(order);
            saga.updateStep("INVENTORY_RESERVED", reservation);

            saga.complete();          // saga finished successfully
            return;

        } catch (PaymentFailedException e) {
            // Compensation: Cancel Order
            orderService.cancelOrder(order);
            saga.fail("PAYMENT_FAILED");
        } catch (InventoryUnavailableException e) {
            // Compensation chain
            paymentService.refundPayment(payment);
            orderService.cancelOrder(order);
            saga.fail("INVENTORY_FAILED");
        }
    }

    // ---- Compensating transaction examples ----
    void compensatePayment(Payment payment) {
        paymentService.refundPayment(payment);   // idempotent refund
    }

    void compensateOrder(Order order) {
        orderService.cancelOrder(order);         // releases any reservations
    }
}

Inventory Service (Local Transaction + Compensation)

class InventoryService {
    InventoryRepository repo;

    // Local transaction – fully committed immediately
    InventoryReservation reserveInventory(Order order) {
        return repo.withinTransaction(() -> {
            Stock stock = repo.findStock(order.productId);
            if (stock.quantity  {
            Stock stock = repo.findStock(reservation.productId);
            stock.quantity += reservation.quantity;
            repo.save(stock);
        });
    }
}

Note: Every command and compensation should carry an idempotency key to safely retry after network failures.

When to Use Which Pattern?

SituationPreferred Approach
Strong, immediate consistency required (e.g., banking, ledger updates)2PC
Long‑running, multi‑service business processes where temporary inconsistency is acceptableSaga
Hybrid: critical synchronous steps inside a bounded context + cross‑context orchestrationCombine 2PC (for the critical part) + Saga (for the rest)
  • No long‑held locks → higher availability.
  • Horizontal scalability – each service only handles its own ACID transaction.
  • Targeted compensations instead of a global abort.
  • Fits naturally with event‑driven architectures and message brokers (Kafka, RabbitMQ) for reliable delivery.

Proper Saga implementations demand:

  • Thoughtful design of compensating transactions.
  • Robust idempotency handling.
  • Comprehensive monitoring of saga instances to detect and resolve stuck workflows.

Further Reading

System Design Handbook – a deep dive into distributed systems, transaction patterns, and more.

  • 📚 Purchase the handbook:
  • ☕ Support the author:
0 views
Back to Blog

Related posts

Read more »