How to Find and Fix Missing Indexes in PostgreSQL
Source: Dev.to
The problem: hidden sequential scans
You ship a feature, tests pass, staging looks fine, and then production starts throwing timeouts on a query that worked perfectly two days ago:
SELECT ... WHERE customer_email = ?
The table now has 10 million rows, and PostgreSQL is reading every single one of them.
No index → full sequential scan → ~200 ms per query on a table hit thousands of times per hour.
In development the table has only a few hundred rows, so a sequential scan takes microseconds and the issue stays invisible until real data volumes appear.
Finding tables that are scanned sequentially
PostgreSQL tracks scan statistics in pg_stat_user_tables. Query the view to spot problem tables:
SELECT
schemaname,
relname AS table_name,
seq_scan,
seq_tup_read,
idx_scan,
CASE WHEN (seq_scan + idx_scan) > 0
THEN round(100.0 * seq_scan / (seq_scan + idx_scan), 1)
ELSE 0
END AS sequential_scan_percentage,
pg_size_pretty(pg_relation_size(schemaname || '.' || relname)) AS table_size
FROM pg_stat_user_tables
WHERE seq_scan > 0
ORDER BY seq_tup_read DESC
LIMIT 20;
What to look for
- High
seq_tup_read(many rows fetched by sequential scans). sequential_scan_percentage> 90 % (almost every access is a sequential scan).idx_scan= 0 (no query on that table uses an index).
A large seq_tup_read indicates a missing or unused index.
Confirming the scan on a specific query
Run EXPLAIN (ANALYZE, BUFFERS) for the query that is slow:
EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM sim_customers
WHERE customer_email = 'test@example.com';
If the output contains Seq Scan with a high actual time and many buffers read, the planner has no usable index.
After creating the index (see next section), run the same command again. You should see Index Scan or Index Only Scan and dramatically fewer buffers.
Adding the missing index
Create the index concurrently to avoid blocking reads and writes:
CREATE INDEX CONCURRENTLY idx_sim_customers_email
ON sim_customers (customer_email);
CONCURRENTLY acquires only a short‑lived lock, builds the index in the background, and prevents downtime. It takes longer and scans the table twice, but it’s safe for production tables.
Verify the index is used with the same EXPLAIN as before; a sequential scan that read 6 400 buffers (≈ 50 MB) may now read only a handful.
Multi‑column indexes
When queries filter on several columns, a composite index is usually more efficient than separate single‑column indexes:
CREATE INDEX CONCURRENTLY idx_orders_customer_date
ON orders (customer_id, order_date);
Column order matters
- Most selective column first – the column that reduces the row set the most should be position 1.
- Range columns last – columns used with
<,>, orBETWEENbelong at the end. - Leading‑prefix rule – the planner can use the index for queries that filter on the leading column(s) only. It cannot use it if the query filters only on a later column (e.g.,
order_datealone).
If the order is wrong, the planner silently ignores the index and falls back to a sequential scan, which also incurs the write overhead of maintaining the unused index.
Why a missing index hurts more than one query
- Sequential scans read large portions of the table, evicting useful pages from the buffer cache.
- This cache pressure degrades unrelated queries that previously benefited from cached data.
- As the table grows, the impact grows non‑linearly, turning a “works fine for months” query into a bottleneck.
Best‑practice checklist
| ✅ | Action |
|---|---|
| 1 | Think about indexes for every new WHERE clause before merging. |
| 2 | Run EXPLAIN (ANALYZE, BUFFERS) against production‑sized data in CI or staging. Flag unexpected Seq Scan on large tables. |
| 3 | Continuously monitor sequential‑scan ratios (e.g., with the query above). A sudden shift signals a missing index. |
| 4 | Keep statistics fresh: ensure autovacuum runs frequently. Stale stats can mislead the planner. |
| 5 | Periodically audit for unused indexes; they waste write I/O and memory. Drop them if they never appear in idx_scan. |
| 6 | Remember that data volume matters: an index that seems unnecessary on 1 000 rows becomes critical on 10 million. |