Read‑your‑writes on replicas: PostgreSQL WAIT FOR LSN and MongoDB Causal Consistency
Source: Dev.to
Overview
In databases designed for high availability and scalability, secondary nodes can fall behind the primary. Typically a quorum of nodes is updated synchronously to guarantee durability while maintaining availability, and the remaining standby instances are eventually consistent to handle partial failures.
To balance availability with performance, synchronous replicas acknowledge a write only when it is durable and recoverable, even if it is not yet readable.
Result: If your application writes data and then immediately queries another node, it may still see stale data.
Example anomaly
You commit an order on the primary and then try to retrieve it from a reporting system. The order is missing because the read replica has not yet applied the write.
Both PostgreSQL and MongoDB provide mechanisms to avoid this problem, but they take different approaches.
| Feature | PostgreSQL WAIT FOR LSN | MongoDB Causal Consistency |
|---|---|---|
| Clock type | Log Sequence Number (LSN) – a 64‑bit WAL position | Hybrid Logical Clock (HLC) timestamp |
| How it works | Block until a standby reaches a target LSN (write, flush, or replay) | Attach afterClusterTime to reads; driver tracks operationTime |
| Typical use‑case | Off‑load expensive reads to replicas while preserving read‑your‑writes | Sessions that may read from any replica (primary or secondary) |
| Extra round‑trips | Yes – fetch LSN from primary, then wait on replica | No – driver handles timestamps automatically |
| Granularity | LSN (per‑transaction) | Cluster time (per‑operation) |
| Availability impact | May increase latency on replicas while they catch up | Reads may block briefly until replica catches up |
| Implementation status | Planned for PostgreSQL 19 (in development) | Available in MongoDB 4.0+ (sessions) |
PostgreSQL: WAIT FOR LSN (PG 19 – currently in development)
PostgreSQL records every change in the Write‑Ahead Log (WAL). Each WAL record has a Log Sequence Number (LSN) – a 64‑bit position usually shown as two hexadecimal halves, e.g. 0/40002A0 (high/low 32 bits).
Streaming replication ships WAL records from the primary to standbys, which then:
- Write WAL records to disk
- Flush them to durable storage
- Replay them, applying changes to data files
| Position | Meaning |
|---|---|
| standby_write | WAL written to disk on the standby (not yet flushed) |
| standby_flush | WAL flushed to durable storage on the standby |
| standby_replay (default) | WAL replayed into data files and visible to readers |
| primary_flush | WAL flushed on the primary (useful when synchronous_commit = off) |
A recent commit for PostgreSQL 19 adds a WAIT FOR LSN command that lets a session block until one of these points reaches a target LSN.
Typical workflow
- Write on the primary and commit.
- Fetch the current WAL insert LSN.
- Wait on a replica until it has caught up to that LSN.
-- 1. Start a transaction on the primary
BEGIN;
-- 2. Insert a row
INSERT INTO orders VALUES (123, 'widget');
-- 3. Commit the transaction
COMMIT;
-- 4. Get the current WAL insert LSN
SELECT pg_current_wal_insert_lsn();
Result (example):
pg_current_wal_insert_lsn
---------------------------
0/18724C0
(1 row)
Now block reads on a replica until it has replayed that LSN:
-- On the replica
WAIT FOR LSN '0/18724C0'
WITH (MODE 'standby_replay', TIMEOUT '2s');
If the replica reaches the requested LSN within the timeout, the command returns; otherwise it aborts with a timeout error.
When to use it
- Expensive reads that you want to off‑load to replicas while still guaranteeing read‑your‑writes semantics.
- Event‑driven / CQRS architectures where the LSN itself serves as a change marker for downstream consumers.
For many workloads, simply reading from the primary is simpler and faster.
MongoDB: Causal Consistency
MongoDB tracks causality using oplog timestamps and a Hybrid Logical Clock (HLC).
- In a replica set, each write on the primary creates an entry in
local.oplog.rs. - Each entry carries an HLC timestamp that combines physical time with a logical counter, yielding a monotonically increasing cluster time.
- Replica set members apply oplog entries in timestamp order.
Because MongoDB allows concurrent writes, “oplog holes” can appear: a later‑timestamped write may commit before an earlier‑timestamped one. A naïve reader could skip the earlier operation.
MongoDB solves this by tracking an oplogReadTimestamp, the highest hole‑free point in the oplog. Secondaries are prevented from reading past this point until all prior operations are visible, ensuring causal consistency even with concurrent commits.
Enforcing causal consistency
- Drivers track the
operationTimeof the last operation in a session. - When a session is created with
causalConsistency: true, the driver automatically includes anafterClusterTimeequal to the highest known cluster time on subsequent reads. - The server blocks the read until its cluster time has advanced beyond
afterClusterTime.
Example (Node.js driver)
// Start a causally consistent session
const session = client.startSession({ causalConsistency: true });
const coll = db.collection("orders");
// Write in this session
await coll.insertOne({ id: 123, product: "widget" }, { session });
// The driver automatically injects afterClusterTime into the read concern
const order = await coll.findOne({ id: 123 }, { session });
With any read preference that allows reading from secondaries as well as the primary, this guarantees read‑your‑writes behavior.
Note: Causal consistency is not limited to snapshot reads; it applies across all read‑concern levels. The session ensures later reads observe at least the effects of earlier writes, regardless of which replica serves the read.
Conclusion
Both PostgreSQL and MongoDB provide mechanisms to achieve read‑your‑writes semantics in distributed environments, but they differ in implementation:
| Aspect | PostgreSQL WAIT FOR LSN | MongoDB Causal Consistency |
|---|---|---|
| Underlying coordinate | WAL Log Sequence Number (LSN) | Hybrid Logical Clock (cluster time) |
| How the client participates | Explicitly fetch LSN, then issue WAIT FOR LSN on replica | Open a causally‑consistent session; driver handles timestamps |
| Typical latency impact | May add an extra round‑trip and wait on replica | Reads may block briefly until replica catches up |
| Primary vs. replica reads | Often still read from primary for simplicity | Reads can safely go to any replica |
| Use‑case focus | Off‑loading heavy reads while preserving ordering | General session‑level consistency across all reads |
Choose the approach that best matches your application’s consistency requirements, latency tolerance, and architectural style.
Physical Byte Offset in the WAL (LSN) vs. Hybrid Logical Clock (HLC)
| Aspect | PostgreSQL (WAL) | MongoDB (HLC) |
|---|---|---|
| Mechanism | Block until replay/write/flush LSN is reached | Block until afterClusterTime is visible |
| Tracking | Application captures LSN | Driver tracks operationTime |
| Granularity | WAL record position | Oplog timestamp |
| Replication Model | Physical streaming | Logical oplog application |
| Hole Handling | N/A (serialized WAL) | oplogReadTimestamp |
| Failover Handling | Error unless NO_THROW | Session continues, bounded by replication state |
Key Takeaways
-
Both PostgreSQL’s
WAIT FOR LSNand MongoDB’s causal consistency ensure that reads can observe prior writes, but they operate at different layers:- PostgreSQL offers manual, WAL‑level precision.
- MongoDB provides automatic, session‑level guarantees.
-
If you want “read‑your‑writes” semantics to just work without extra coordination calls, MongoDB’s session‑based model is a strong fit.
-
Despite persistent myths about consistency, MongoDB delivers strong consistency in a horizontally scalable system while offering a simple developer experience.