Your Diesel Migrations Might Be Ticking Time Bombs
Source: Dev.to
The Postgres Lock Problem
Handshake deployed what looked like a routine migration during their regular multi‑daily release cycle:
ALTER TABLE table_name
ADD CONSTRAINT fk_user_id
FOREIGN KEY (user_id) REFERENCES users(id);
60 seconds later, their entire site was down.
The problem? Adding a foreign‑key constraint requires an ACCESS EXCLUSIVE lock on the referenced table. PostgreSQL grants locks first‑come, first‑served. A long‑running query on the users table was holding an ACCESS SHARE lock, so the migration queued up waiting for it. Since ACCESS EXCLUSIVE conflicts with everything, all the regular SELECT queries that arrived after the migration also queued behind it. The lock queue grew, the site stopped responding, and they had to abort the migration to bring the site back up.
GoCardless hit a similar issue. They were recreating foreign‑key constraints on renamed tables. The tables were empty, so it seemed safe, but adding the constraints required locks on the parent tables, which were heavily used. API timeouts spanned the board for ~15 seconds.
Both incidents came from database migrations that looked completely normal, ran fine in staging, and only showed their true colors in production.
Common Operations That Take ACCESS EXCLUSIVE Locks
PostgreSQL uses different lock levels to keep your data consistent. The most restrictive is ACCESS EXCLUSIVE. When something holds this lock, nothing else can touch that table—not SELECT, not INSERT, nothing.
A bunch of common migration operations take ACCESS EXCLUSIVE locks:
Creating Indexes
CREATE INDEX idx_users_email ON users(email);
This takes a SHARE lock, which blocks all writes (INSERT, UPDATE, DELETE) while the index builds. On a table with millions of rows, this can take minutes.
Adding NOT NULL
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
PostgreSQL must verify every row has a non‑null value, holding an ACCESS EXCLUSIVE lock the entire time.
Changing Column Types
ALTER TABLE products ALTER COLUMN price TYPE NUMERIC(10,2);
This triggers a complete table rewrite—every row is converted to the new type—while holding an ACCESS EXCLUSIVE lock.
The Lock Queue Makes It Worse
PostgreSQL’s lock queue is first‑come, first‑served. If a migration is waiting for a lock, all subsequent queries queue behind it, even if they don’t conflict with the original lock holder.
Typical scenario:
- A long‑running query holds an
ACCESS SHARElock onusers. - Migration tries to acquire
ACCESS EXCLUSIVEand queues. - A new
SELECTarrives; it would normally be fine withACCESS SHARE, but because the migration is waiting, thisSELECTis placed at the back of the queue. - More
SELECTs arrive and also queue. - The application starts timing out.
The migration hasn’t even started yet, and you’re already down.
Why This Is Hard to Catch
These migrations look fine in development (hundreds of rows) and run instantly in staging (tens of thousands). In production (millions of rows), they lock the table for minutes, blocking traffic. You don’t discover the issue until it runs against real data with real traffic.
Enter diesel‑guard
diesel-guard is a static‑analysis tool that scans your Diesel migration files for dangerous operations.
Install
cargo install diesel-guard
Run
diesel-guard check migrations/
Example Output
❌ Unsafe migration detected in migrations/2024_01_01_add_fk/up.sql
❌ ADD COLUMN with DEFAULT
Problem:
Adding column 'status' with DEFAULT on table 'orders' requires a full
table rewrite on PostgreSQL < 11, which acquires an ACCESS EXCLUSIVE
lock. On large tables, this can take significant time and block all
operations.
Safe alternative:
1. Add the column without a default:
ALTER TABLE orders ADD COLUMN status TEXT;
2. Backfill data in batches (outside migration):
UPDATE orders SET status = 'pending' WHERE status IS NULL;
3. Add default for new rows only:
ALTER TABLE orders ALTER COLUMN status SET DEFAULT 'pending';
Note: For Postgres 11+, this is safe if the default is a constant.
The tool tells you:
- What’s dangerous about the operation
- Which lock it takes
- Step‑by‑step safe fix with SQL
Safe Alternatives for Common Operations
Creating Indexes
Instead of
CREATE INDEX idx_orders_created_at ON orders(created_at);
Do
CREATE INDEX CONCURRENTLY idx_orders_created_at ON orders(created_at);
CONCURRENTLY allows writes to continue while the index builds. Because CREATE INDEX CONCURRENTLY cannot run inside a transaction, add a metadata.toml file:
# migrations/2024_01_01_add_order_index/metadata.toml
run_in_transaction = false
Adding NOT NULL
Instead of
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
Do
-- Add a CHECK constraint without validating existing rows
ALTER TABLE users
ADD CONSTRAINT users_email_not_null_check
CHECK (email IS NOT NULL) NOT VALID;
-- Validate separately (lighter lock)
ALTER TABLE users VALIDATE CONSTRAINT users_email_not_null_check;
-- Finally, drop the CHECK and add the NOT NULL constraint if desired
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
The initial CHECK … NOT VALID acquires only a lightweight lock, and the later VALIDATE CONSTRAINT can be run in small batches to avoid long exclusive locks.
Adding NOT NULL Constraints
-- Add a check constraint (fast since we validated)
ALTER TABLE users ADD CONSTRAINT users_email_not_null_check CHECK (email IS NOT NULL);
-- Add NOT NULL (fast since we validated)
ALTER TABLE users ALTER COLUMN email SET NOT NULL;
-- Clean up
ALTER TABLE users DROP CONSTRAINT users_email_not_null_check;
The VALIDATE step uses SHARE UPDATE EXCLUSIVE, which allows reads and writes to continue.
Adding UNIQUE Constraints
Instead of:
ALTER TABLE users ADD CONSTRAINT users_email_key UNIQUE (email);
Do this:
-- Build index concurrently
CREATE UNIQUE INDEX CONCURRENTLY users_email_idx ON users(email);
-- Add constraint using existing index (instant)
ALTER TABLE users
ADD CONSTRAINT users_email_key
UNIQUE USING INDEX users_email_idx;
Adding Foreign Keys
Instead of:
ALTER TABLE posts
ADD CONSTRAINT posts_user_id_fkey
FOREIGN KEY (user_id) REFERENCES users(id);
Do this:
-- Add constraint without validating existing rows
ALTER TABLE posts
ADD CONSTRAINT posts_user_id_fkey
FOREIGN KEY (user_id) REFERENCES users(id)
NOT VALID;
-- Validate separately (lighter lock)
ALTER TABLE posts VALIDATE CONSTRAINT posts_user_id_fkey;
NOT VALID means it doesn’t scan existing rows. VALIDATE happens separately with a lighter lock.
Setting Timeouts
One thing both Handshake and GoCardless learned: set lock_timeout in your migrations.
-- At the top of your migration
SET lock_timeout = '2s';
ALTER TABLE users ADD COLUMN email TEXT;
If the migration can’t get a lock within 2 seconds, it fails instead of queuing indefinitely. Your app stays up, and you can retry during lower traffic.
When You Know It’s Safe
Sometimes you know a migration is safe (tiny table, maintenance window, etc.):
-- safety-assured:start
-- Safe because: table has 50 rows, deploying during maintenance window
ALTER TABLE countries ADD COLUMN flag_emoji TEXT DEFAULT '🏳️';
-- safety-assured:end
diesel-guard will skip anything in these blocks.
Configuration
Put a diesel-guard.toml in your project:
# Skip checking migrations before this date
start_after = "2024_01_01_000000"
# Check down.sql too
check_down = true
# Disable specific checks if needed
disable_checks = ["CreateExtensionCheck"]
CI/CD Integration
Add to GitHub Actions:
name: Check Migrations
on: [pull_request]
jobs:
check-migrations:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: ayarotsky/diesel-guard@v0.3.0
Now dangerous migrations get caught in PR review, not production.
What It Checks
diesel-guard currently detects 18 different issues:
- ADD COLUMN with DEFAULT
- CREATE INDEX without CONCURRENTLY
- ADD NOT NULL
- ADD UNIQUE constraint
- ADD FOREIGN KEY without NOT VALID
- ADD PRIMARY KEY to existing tables
- ADD SERIAL columns
- ALTER COLUMN TYPE
- CREATE EXTENSION
- DROP COLUMN
- DROP INDEX without CONCURRENTLY
- DROP PRIMARY KEY
- RENAME COLUMN
- RENAME TABLE
- Short integer primary keys (SMALLINT/INT)
- TRUNCATE TABLE
- Unnamed constraints
- Using JSON instead of JSONB
- Wide indexes (4+ columns)
More coming. The target is 40 checks covering most dangerous PostgreSQL operations.
Why This Matters for Rust
The Rust ecosystem has great tooling. clippy lints your code. cargo audit catches security problems. But we didn’t have anything for database migrations.
I’ve seen too many production incidents from migrations. The fix is usually obvious in hindsight, but you only find out when it’s causing downtime.
diesel-guard brings the fix forward to development time.
Should You Use This?
Maybe you’re thinking “my tables are small.”
Tables grow. The users table with 100 rows today might have a million rows next year. The migration that’s instant now might take minutes then.
Building safe migrations from the start is way easier than fixing them during an incident. And diesel-guard takes a couple seconds to run.
Wrapping Up
Database migrations can be tricky. Operations that look perfectly safe can cause serious production issues. The gap between how they behave in staging versus production can be huge, and often you only find out when it’s too late.
Static analysis can catch these problems early. Whether you use diesel-guard or build your own checks, having something review your migrations before they reach production is worth it. The patterns are well‑documented and you just need tooling to enforce them.
Building safety into your database migration workflow pays off.