EP 6.1: How to Search 1 Billion Rows in Milliseconds

Published: 1 month ago (December 29, 2025 at 05:00 AM EST)

3 min read

Source: Dev.to

The Problem: The Full Table Scan

Imagine a library with 1 million books. If you want to find The Great Gatsby and the books aren’t organized, you have to look at every single book one by one until you find it. In database terms, this is a full table scan (linear time complexity).

As your database grows from thousands to millions of rows, your SELECT queries will slow down to a crawl, eventually timing out and crashing your application.

Indexes

An index is a separate data structure that stores a sorted version of a column’s data alongside a pointer (reference) to the actual row in the main table. Most relational databases (like PostgreSQL and MySQL) use a B‑Tree (balanced tree) structure.

How a B‑Tree Index Works

Root – The search starts at the top node.
Decision – The system compares your value (e.g., User ID = 505) to the values in the node and decides whether to go left or right.
Leaf – You reach the bottom of the tree, which contains the exact location of the data on disk.

This turns a search from O(n) into O(log n). Searching 1 million rows suddenly only takes about 20 “jumps.”

Analogy

Think of the index like the index at the back of a textbook:

You look for the keyword “Load Balancing.”
The index tells you it’s on page 42.
You jump directly to page 42 without reading the previous 41 pages.

Types of Indexes

Clustered Index (Primary Key)

The table itself is physically sorted in this order.
You can have only one clustered index per table.

Non‑Clustered Indexes

Additional indexes you create on columns like email or created_at.
They live in a separate “folder” from your main data and point back to the clustered index.

Composite Indexes

If your query always looks like:

WHERE country = 'India' AND city = 'Mumbai'

a single index covering both columns is much faster than the database trying to merge two separate indexes.

Pro tip: Order matters in composite indexes. An index on (city, country) is not the same as (country, city).

Trade‑offs

Read vs. Write: Indexes speed up reads (SELECT) but slow down writes (INSERT/UPDATE/DELETE).
Write Penalty: Every time you add a new row, the database must find the correct spot in every B‑Tree associated with that table and rebalance the tree.
Storage Cost: Indexes take up disk space; on massive tables, an index can sometimes be larger than the data itself.

Practical Guidelines

Index Foreign Keys: This makes JOIN operations significantly faster.
Covering Queries: If your index contains all the data the SELECT needs (e.g., you index email and only select email), the database doesn’t even have to look at the main table. This is called an index‑only scan.
Avoid Low‑Cardinality Columns: Indexing a column like gender (values only M/F/O) is often useless because the database still has to scan a large portion of the table.
Monitor with EXPLAIN ANALYZE: Before adding an index, run your query with EXPLAIN to see if the database is performing a sequential scan or using an existing index.

Best Practices

Indexes are the most powerful tool for database performance, but use them like salt: too much will ruin the dish.

Start by indexing columns used in WHERE clauses, JOIN conditions, and ORDER BY statements.
Monitor your write latency and adjust as needed.