Optimizing PostgreSQL Queries for Large-Scale Data Applications

Published: (December 26, 2025 at 11:31 PM EST)
6 min read
Source: Dev.to

Source: Dev.to

Introduction

Over the past two years we’ve reduced query times from > 15 seconds to much lower values.

❌ Query time: 18 seconds – killed responsiveness.

2. Indexing Strategies

2.1 B‑tree indexes (equality & range)

-- Single‑column indexes
CREATE INDEX idx_orders_created_at   ON orders(created_at);
CREATE INDEX idx_orders_customer_id  ON orders(customer_id);

-- Composite index for multi‑column filter
CREATE INDEX idx_orders_customer_date
    ON orders(customer_id, created_at);

Impact: Query time ↓ to 8 seconds.

2.2 Partial indexes (index only active rows)

CREATE INDEX idx_orders_active
    ON orders(created_at)
    WHERE status = 'active';

Reduces index size by ~60 %.

Impact: Query time ↓ to 4.5 seconds.

2.3 GIN index for JSONB columns

CREATE INDEX idx_orders_metadata
    ON orders USING GIN (metadata);
-- Fast JSONB lookup
SELECT *
FROM orders
WHERE metadata @> '{"priority": "high"}';

2.4 Monitoring & Maintenance

-- Index usage statistics
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
-- Drop unused indexes
DROP INDEX IF EXISTS unused_index_name;

-- Rebuild bloated indexes concurrently
REINDEX INDEX CONCURRENTLY idx_orders_created_at;

3. Query‑Level Optimizations

3.1 Select only needed columns

-- ❌ Bad
SELECT * FROM orders WHERE status = 'active';

-- ✅ Good
SELECT id, customer_id, total_amount, created_at
FROM orders
WHERE status = 'active';

Impact: ~40 % reduction in data transfer time.

3.2 Explain & Analyze

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.total_amount, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at >= CURRENT_DATE - INTERVAL '30 days';

Key metrics to watch

Plan nodeInterpretation
Seq ScanBad – needs an index
Index ScanGood
Bitmap Heap ScanGood for large result sets
Nested LoopBad for large tables
Hash JoinGood for large tables

3.3 Join type choice

-- ❌ Bad: many LEFT JOINs returning NULLs
SELECT o.*, c.*, p.*, s.*
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.id
LEFT JOIN products  p ON o.product_id = p.id
LEFT JOIN shipments s ON o.id = s.order_id;

-- ✅ Good: use INNER JOIN when rows must exist
SELECT o.id, o.total_amount, c.name, p.title
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
INNER JOIN products  p ON o.product_id = p.id
WHERE o.status = 'completed';

4. Partitioning

4.1 Range partitioning by created_at

-- Parent table
CREATE TABLE orders (
    id           SERIAL,
    customer_id  INTEGER,
    total_amount DECIMAL(10,2),
    created_at   TIMESTAMP NOT NULL,
    status       VARCHAR(50)
) PARTITION BY RANGE (created_at);

-- Monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE orders_2024_02 PARTITION OF orders
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
-- Index on each partition
CREATE INDEX idx_orders_2024_01_customer
    ON orders_2024_01(customer_id);

Impact: Date‑range scans became 10× faster.

4.2 List partitioning by status

CREATE TABLE orders (
    id           SERIAL,
    customer_id  INTEGER,
    status       VARCHAR(50),
    created_at   TIMESTAMP
) PARTITION BY LIST (status);

CREATE TABLE orders_active PARTITION OF orders
FOR VALUES IN ('pending', 'processing', 'shipped');

CREATE TABLE orders_completed PARTITION OF orders
FOR VALUES IN ('delivered', 'completed');

5. Connection Pooling (PgBouncer)

pgbouncer.ini

[databases]
myapp_db = host=localhost port=5432 dbname=production

[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
pool_mode   = transaction
max_client_conn = 1000
default_pool_size = 25
reserve_pool_size = 5

6. Node.js Implementation

// db.js
const { Pool } = require('pg');

const pool = new Pool({
  host: 'localhost',
  port: 6432,               // PgBouncer port
  database: 'myapp_db',
  user: 'app_user',
  password: process.env.DB_PASSWORD,
  max: 20,                  // max connections in pool
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 2000,
});

// Efficient query execution
async function getActiveOrders() {
  const client = await pool.connect();
  try {
    const result = await client.query(
      'SELECT id, total_amount FROM orders WHERE status = $1',
      ['active']
    );
    return result.rows;
  } finally {
    client.release(); // always release!
  }
}

Impact: Connection‑overhead ↓ 70 %.

7. Redis Caching Layer

// cache.js
const redis = require('redis');
const client = redis.createClient();

async function getCustomerOrders(customerId) {
  const cacheKey = `customer:${customerId}:orders`;

  // 1️⃣ Check cache first
  const cached = await client.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }

  // 2️⃣ Fallback to DB
  const result = await pool.query(
    'SELECT * FROM orders WHERE customer_id = $1',
    [customerId]
  );

  // 3️⃣ Cache for 5 minutes
  await client.setEx(cacheKey, 300, JSON.stringify(result.rows));

  return result.rows;
}

8. Materialized View for Heavy Aggregations

CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT
    DATE(created_at)          AS sale_date,
    COUNT(*)                  AS total_orders,
    SUM(total_amount)        AS total_revenue,
    AVG(total_amount)         AS avg_order_value
FROM orders
WHERE status = 'completed'
GROUP BY DATE(created_at);

Refresh as needed (e.g., nightly) to keep reporting fast.

TL;DR

  1. Add appropriate B‑tree, partial, and GIN indexes.
  2. Select only required columns and use INNER JOIN where possible.
  3. Partition large tables by date or status.
  4. Monitor index usage and reindex bloated indexes.
  5. Pool connections with PgBouncer and cache frequent lookups in Redis.
  6. Use materialized views for expensive aggregations.

These steps collectively turned a 15‑second query into sub‑200 ms responses, enabling our application to scale gracefully with millions of daily transactions.

Materialized View Indexing

CREATE INDEX idx_daily_sales_date 
ON daily_sales_summary(sale_date);

Refresh the view (can be scheduled)

REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales_summary;

Performance Impact: Dashboard queries went from 12 s to 100 ms.

Bulk Inserts

❌ Bad – Multiple individual inserts

// 10 000 separate INSERTs
for (let i = 0; i  1000
    `),
    connections: await pool.query(`
      SELECT COUNT(*) FROM pg_stat_activity 
      WHERE state = 'active'
    `),
    tableSize: await pool.query(`
      SELECT pg_size_pretty(pg_database_size(current_database()))
    `)
  };

  console.log('Database Health:', checks);

  // Alert if thresholds exceeded
  if (parseInt(checks.slowQueries.rows[0].count, 10) > 10) {
    error('⚠️ Too many slow queries detected!');
  }
}

// Run every 5 minutes
setInterval(healthCheck, 5 * 60 * 1000);

Query Patterns

❌ Bad – N+1 queries

const orders = await pool.query('SELECT * FROM orders LIMIT 100');
for (let order of orders.rows) {
  const customer = await pool.query(
    'SELECT * FROM customers WHERE id = $1',
    [order.customer_id]
  );
  order.customer = customer.rows[0];
}

✅ Good – Single JOIN

const result = await pool.query(`
  SELECT o.*, c.name, c.email
  FROM orders o
  JOIN customers c ON o.customer_id = c.id
  LIMIT 100
`);

❌ Bad – SQL‑injection risk & no plan caching

const query = `SELECT * FROM orders WHERE id = ${userId}`;

✅ Good – Prepared statement

const query = 'SELECT * FROM orders WHERE id = $1';
await pool.query(query, [userId]);

Full‑Text Search Setup

-- Add tsvector column
ALTER TABLE products ADD COLUMN search_vector tsvector;

-- Create GIN index
CREATE INDEX idx_products_search 
ON products USING GIN(search_vector);
-- Update trigger to maintain search_vector
CREATE TRIGGER tsvector_update 
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION
tsvector_update_trigger(search_vector, 'pg_catalog.english', title, description);
-- Efficient full‑text search
SELECT * FROM products
WHERE search_vector @@ to_tsquery('english', 'laptop & gaming');

High‑Write Workload Tweaks

wal_compression = on
wal_writer_delay = 200ms
commit_delay = 10
commit_siblings = 5

Best‑Practice Checklist

  • Measure first: Use EXPLAIN ANALYZE before optimizing.
  • Index wisely: More indexes ≠ better performance.
  • Monitor continuously: pg_stat_statements is your best friend.
  • Test in production‑like environments: Local DB won’t reveal scaling issues.
  • Apply incremental changes: Optimize one thing at a time.
  • Document everything: Future you will thank present you.

Optimizing PostgreSQL for large‑scale applications is an iterative process. The techniques shared here reduced query times by 100× and cut infrastructure costs by 60 %.

Key Takeaways

  • ✅ Strategic indexing (not over‑indexing)
  • ✅ Query restructuring & proper JOINs
  • ✅ Partitioning for time‑series data
  • ✅ Connection pooling with PgBouncer
  • ✅ Caching at multiple levels
  • ✅ Regular monitoring & maintenance

Tools: pgAdmin (GUI management), pg_stat_statements

  • Query statistics
  • PgBouncer – Connection pooling
  • pgBadger – Log analyzer
  • explain.depesz.com – Visual EXPLAIN analyzer

Have you faced similar scaling challenges? What optimization techniques worked for your use case? Let me know in the comments!

Tags: performance #optimization #sql #backend #devops #scalability

Back to Blog

Related posts

Read more »

Indexes and the Rise of the DBMS

Hello, I'm Maneshwar. I'm working on FreeDevTools onlinehttps://hexmos.com/freedevtools, currently building one place for all dev tools, cheat codes, and TLDRs...