Just Use Postgres for Durable Workflows

Published: 1 week ago (May 28, 2026 at 02:41 PM EDT)

6 min read

Source: Hacker News

# Durable Workflows: A Simpler Approach Using the Database as Orchestrator

Durable workflows are a simple but powerful tool for building reliable programs.  
The idea is that, as your program runs, you regularly checkpoint its progress to a database.  
That way, if your program ever crashes or fails, you can reload from the last checkpoint to recover it from its last completed step.  
You can think of this like saving in a video game: you regularly **“save”** your program’s progress so that if it crashes, you can **“reload”** it from its last checkpoint.

## External Orchestration – The Conventional Model

Most commonly, durable workflows are implemented via **external orchestration**.  
This is the pattern used by systems like Temporal, Airflow, and AWS Step Functions.  
In this model, durable programs are written as workflows of steps whose execution is coordinated by a central orchestrator.

1. A client submits a workflow.  
2. The orchestrator creates a record for it in a data store, then dispatches it to a worker for execution.  
3. Each time a worker completes a step, it sends the step’s outcome back to the orchestrator.  
4. The orchestrator checkpoints the output in its data store, then dispatches the next step.  
5. If a worker crashes or fails, the orchestrator dispatches its workflows to another worker, starting them from their last checkpointed step.

![External orchestration diagram](https://cdn.prod.website-files.com/672411cbf038560468c9e68f/6a0dbd9279898da496135035_f62a8f73.png)

## Why External Orchestration Is Fundamentally Over‑Complicated

In this blog post, we’ll argue that external orchestration is **fundamentally over‑complicated**.  
The core idea of durable workflows is to checkpoint program state in a database.  
If durable workflows are about databases, there’s no reason to have a separate orchestrator server.  
Instead, it’s simpler and more efficient to use the database itself as an orchestrator.

To make this concrete, we’ll focus specifically on building durable workflows on **PostgreSQL**, because of its:

- Popularity  
- Scalability ([benchmark](https://www.dbos.dev/blog/benchmarking-workflow-execution-scalability-on-postgres))  
- Rich ecosystem  

### The Database‑Backed Design

In a Postgres‑backed durable‑workflow system, application servers communicate directly with Postgres to execute workflows, bypassing a central orchestrator.

1. A client submits a workflow by creating an entry in a `workflows` table.  
2. Application servers poll the table, dequeue workflows, and execute them.  
3. As a server executes a workflow, it checkpoints the output of each step to Postgres.  
4. If a server crashes, another server can recover the workflow from its checkpoints.

![Database‑backed workflow diagram](https://cdn.prod.website-files.com/672411cbf038560468c9e68f/6a0dbd9279898da496135032_7bffa8dc.png)

This design eliminates the need for a central orchestrator because:

- **Dispatch**: Workers dequeue workflows from a Postgres table using locking clauses (e.g., `SELECT … FOR UPDATE SKIP LOCKED`) to guarantee that each workflow is processed by exactly one worker.  
- **Checkpointing**: Workers write step results directly to Postgres.  
- **Duplicate‑work detection**: Integrity constraints (e.g., unique indexes) let workers detect and back off if another worker has already checkpointed the same step.

Replacing a central orchestrator with Postgres (or another database) makes durable workflows fundamentally simpler. Hard problems such as scalability, availability, observability, and security can be addressed with well‑understood Postgres‑native solutions.

## Scalability and Availability

The scalability and availability of a database‑backed durable‑workflow system are determined by the underlying database.

- **Scalability**: The system can scale horizontally by adding more worker servers; the bottleneck becomes how quickly the database can process workflow records.  
- **Availability**: Workers are fungible and can freely recover each other’s state, so the system remains available as long as the database is up.

### Postgres‑Specific Benefits

- **Vertical scaling**: A single Postgres instance can handle **tens of thousands of workflows per second** ([benchmark](https://www.dbos.dev/blog/benchmarking-workflow-execution-scalability-on-postgres)).  
- **Horizontal scaling**: Distributed solutions such as **CockroachDB** or sharded Postgres clusters can further increase capacity.  
- **High availability**: Postgres supports streaming replication with automatic failover, and managed offerings provide multi‑AZ deployments with high‑availability SLAs out of the box.  

Decades of engineering work on Postgres scalability and availability can be leveraged directly for durable workflows.

## Observability

When using Postgres‑backed durable execution, workflows and their steps are checkpointed to tables.  
Observability is therefore **built‑in**: you can query those tables to monitor workflows in real time and visualize execution.

Because virtually any observability query can be expressed in SQL, you get powerful, declarative analytics for free.  
For example, the following query finds all workflows that errored in the last month:

```sql
SELECT
    workflow_id,
    step_name,
    error_message,
    completed_at
FROM
    workflow_steps
WHERE
    status = 'error'
    AND completed_at >= now() - interval '1 month'
ORDER BY
    completed_at DESC;

A query like this might seem obvious, but it’s hard to overstate how powerful it is.
Postgres’s relational model lets you express complex filtering and analytical operations declaratively, leveraging decades of query‑optimization research.
Many external orchestrators rely on key‑value stores that lack such expressive power. By storing workflow and step data in Postgres tables and adding secondary indexes for fast analytical queries, you obtain efficient observability “for free.”

The remainder of this post will dive deeper into security, fault‑tolerance, and implementation patterns for PostgreSQL‑backed durable workflows.

Reliability and Security

When using an external orchestrator for durable execution, both the orchestrator and its data store are single points of failure. Because they directly coordinate workflow execution, if either has downtime, the entire application becomes unavailable. Moreover, because they process and store workflow and step checkpoints, they likely have access to sensitive application data, meaning they must be hardened, access‑controlled, and audited like any other piece of sensitive infrastructure.

By contrast, the only point of failure in Postgres‑backed durable execution is Postgres itself, and all workflow data is stored directly in Postgres and never transits any other system. If an application already depends on Postgres, adopting durable execution does not add any new points of failure to the system nor introduce new surface area to secure. Databases are already critical infrastructure, so it makes more sense to reuse them for orchestration than to add new critical infrastructure for it.

Learn More

If you like building scalable, reliable systems, we’d love to hear from you. At DBOS, our goal is to make Postgres‑backed durable execution as simple and performant as possible. Check it out:

Quickstart:
GitHub:
Discord community:

Just Use Postgres for Durable Workflows

Reliability and Security

Learn More

Related posts

Does Postgres RLS actually ruin performance? Let’s look at the data.

PostgreSQL 01008 오류 원인과 해결 방법 완벽 가이드

A Smarter Google AI Edge Gallery: MCP integration, notifications, and session continuity

Announcing ADK for Kotlin and ADK for Android 0.1.0: Building AI Agents on Android and Beyond