Underrated Postgres: Extended Statistics for Better Execution Plans

Published: (March 18, 2026 at 04:30 PM EDT)
4 min read
Source: Dev.to

Source: Dev.to

Overview

Cover image for Underrated Postgres: Extended Statistics for Better Execution Plans

One of the features of Postgres which is very little known and even less used (at least I rarely see it in production) is extended statistics to capture and encode the dependencies, relations, and potential combinations of multi‑column values.

The query planner typically creates execution plans based on single‑column statistics, but relationships across columns are very common in real‑world applications. I personally would have loved to know about it earlier. Usually, I’d add another index hoping it improves access patterns, but the cost of all the indexes quickly reaches or even exceeds the size of the original table. Extended statistics are here to solve the issue—often a lighter‑weight alternative to extra indexes.

The Problem with Single-Column Statistics

PostgreSQL’s query planner relies heavily on statistics to estimate the cost of different execution plans. By default, it collects statistics on individual columns. This works well when columns are independent. However, in many real‑world scenarios, columns are correlated.

Consider a table users with columns city and country. If you query for users in city = 'Paris' and country = 'France', the planner might underestimate the number of rows returned because it assumes the two conditions are independent, multiplying the probability of city = 'Paris' by the probability of country = 'France'. In reality, if the city is Paris, the country is almost certainly France.

This underestimation can lead the planner to choose a suboptimal execution plan (e.g., a nested loop join instead of a hash join), resulting in poor performance.

Enter Extended Statistics

Extended statistics allow you to tell PostgreSQL about the relationships between columns. You can create statistics on multiple columns, enabling the planner to make more accurate estimates.

Creating Extended Statistics

You can create extended statistics using the CREATE STATISTICS command. Here’s a simple example:

CREATE STATISTICS city_country_stats (dependencies)
ON city, country
FROM users;

This command creates statistics that capture the functional dependencies between the city and country columns in the users table.

Types of Extended Statistics

PostgreSQL supports several types of extended statistics:

  • Dependencies – Captures functional dependencies between columns (e.g., city determines country). Useful for queries with multiple equality conditions.
  • N‑Distinct – Estimates the number of distinct values for a combination of columns. Useful for GROUP BY queries.
  • MCV (Most Common Values) – Stores the most common combinations of values across multiple columns. Useful for queries with complex conditions (e.g., inequalities, OR clauses).

You can specify multiple types when creating statistics:

CREATE STATISTICS complex_stats (dependencies, mcv)
ON col1, col2, col3
FROM my_table;

When to Use Extended Statistics

Extended statistics are particularly useful when:

  • You have queries with multiple conditions on correlated columns.
  • The query planner consistently underestimates or overestimates row counts.
  • You are considering adding a multi‑column index solely to improve query planning (extended statistics are often a lighter‑weight alternative).

Common Pitfalls

  • Overuse – Don’t create extended statistics on every combination of columns. They consume storage and add overhead to ANALYZE operations. Create them only when you identify a specific query‑planning issue.
  • Forgetting to ANALYZE – After creating extended statistics, you must run ANALYZE on the table for the planner to start using them.
  • Complex Expressions – Extended statistics currently work only on simple column references, not on expressions (e.g., LOWER(col)).

Conclusion

Extended statistics are a powerful but underutilized feature in PostgreSQL. By providing the query planner with information about column relationships, you can significantly improve the performance of complex queries without the overhead of unnecessary indexes.

  • Use CREATE STATISTICS to capture dependencies, n‑distinct values, or MCVs.
  • Target specific queries where the planner is making poor estimates.
  • Remember to ANALYZE your tables after creating statistics.
0 views
Back to Blog

Related posts

Read more »

Stop Leaking Data in Multi-Tenant Apps

Why Your application logic isn't Enough: The Case for Database-Level Row-Level Security You've built a robust multi‑tenant SaaS. You've implemented tenant_id f...