Understanding Database Indexes: How They Work and When They Hurt Performance

Published: 1 month ago (December 12, 2025 at 04:05 PM EST)

5 min read

Source: Dev.to

The Nightmare: The Full Table Scan

Imagine walking into a library with 100,000 books and looking for a specific title. If the books are just piled in the middle of the room in no particular order, you have to pick up every single book until you find the right one.

In database terms, this is a Full Table Scan.

If your table has 1 million rows and no index on the column you are searching, the database engine has to read every single row from disk. This is O(N) complexity—slow, expensive in terms of I/O, and it doesn’t scale.

The “Dictionary” Analogy

The most common way to explain an index is the Table of Contents at the back of a book, but a dictionary is a better comparison.

In a dictionary, words are stored in alphabetical order. Because they are sorted, you don’t start at page one to find the word “Node.” You jump to the middle, see that “N” comes after the current page, and narrow your search.

What an index does: it creates a separate, sorted data structure that points to the actual location of the data.

Under the Hood: The B‑Tree

If you want to move from Junior to Senior, you need to know the name B‑Tree (Balanced Tree). Most modern databases (PostgreSQL, MySQL, SQL Server) use B‑Trees for their default indexing.

A B‑Tree solves the write‑speed problem of simple sorted lists by organizing data into a hierarchy of nodes.

Root: the entry point.
Internal Nodes: act like signposts (“Go left for A‑M, go right for N‑Z”).
Leaf Nodes: contain the actual pointers to rows on disk.

Because the tree is balanced, the distance from the top to any piece of data is always roughly the same. Searching a B‑Tree is O(log N). For 1 million rows, that’s only about 20 “jumps” through the tree.

Conceptual Search Example

Imagine we are searching for a user with email = 'chris@example.com' in a large users table, and we have an index on the email column.

Instead of reading a million rows, the database conceptually performs a few logical steps:

Start at the Root Node (Jump 1): The root holds boundaries (e.g., the lowest and highest values in each sub‑branch). It determines that chris@example.com falls between the values of the middle internal node.
Move to Internal Node (Jump 2): The database loads the middle node, which might say:
```
* Emails  O:   Go Right
```
Since c is before F, the database chooses the Left Leaf Node pointer.

Move to Leaf Node (Jump 3): The leaf node is a dense, sorted list containing the email and the primary‑key ID (the pointer). The database performs a binary search within this small list:

[
  { "email": "adam@example.com",  "userId": 101 },
  { "email": "ben@example.com",   "userId": 450 },
  { "email": "chris@example.com", "userId": 221 }, // Found it!
  { "email": "diana@example.com", "userId": 890 }
]

Fetch the Data: Using userId: 221, the database retrieves the full user record from the main table.

The search completes in four disk operations (log N) instead of a million. That is the power of a B‑Tree index.

Clustered vs. Non‑Clustered Indexes

Clustered Index (The Physical Order)

Think of this as the dictionary itself. The data is physically stored on disk in the order of the index—almost always your primary key.

Rule: You can have only one clustered index per table (because you can’t physically sort the same data in two different ways).

Non‑Clustered Index (The “Phone Book”)

This is a separate structure from the actual table. It contains a copy of the indexed column and a “pointer” (like a GPS coordinate) to where the rest of the row lives.

Rule: You can have many non‑clustered indexes, but each one adds “weight” to your database.

The Cost: When Indexes Hurt Performance

If indexes are magical, why not index every column? Because every index is a trade‑off: faster reads but slower writes, plus storage cost.

The Write Penalty

Whenever you modify a column that is indexed, the database must update the B‑Tree:

INSERT: add the new value and possibly rebalance the tree.
UPDATE: delete the old entry and insert the new one.
DELETE: locate and remove the entry.

If a table has five indexes, a single INSERT triggers five index‑write operations in addition to writing the main record. On high‑write tables (e.g., logging or telemetry), too many indexes can severely degrade performance.

Storage and Memory Overhead

Indexes are separate data structures that consume disk space and memory. Databases try to keep frequently accessed index nodes in RAM. If an index is larger than available memory, the system must read index nodes from disk, negating speed benefits and causing I/O bottlenecks. Larger indexes also reduce the memory available for caching actual data.

Advanced Concepts: Indexing with Precision

Composite Indexes (Order Matters)

A composite index uses multiple columns in a specific order. They are crucial for supporting complex WHERE and ORDER BY clauses.

Example query

SELECT * FROM orders
WHERE customer_id = 123 AND order_date > '2023-01-01'
ORDER BY order_date DESC;

Create a composite index that matches the pattern: (customer_id, order_date).

Why order matters

The first column (customer_id) narrows the data slice.
The second column (order_date) satisfies the additional filter and the ORDER BY without extra sorting.

If you reverse the order to (order_date, customer_id), the index becomes useless for queries that filter only by customer_id. Rule: put the most selective column first.

The Cardinality Trap

Cardinality refers to the number of unique values in a column relative to the total number of rows.

High Cardinality: columns like email or SSN (every value is unique). Indexes are extremely effective.
Low Cardinality: a column like

(content truncated)