How POSTGRES indexing is more efficient than MYSQL
Source: Dev.to
Problem setup (How Index Scan works internally for both DBs)
Table definition (MySQL & PostgreSQL)
CREATE TABLE "user" (
id BIGINT PRIMARY KEY,
name VARCHAR(100),
age INT
);
Index
CREATE INDEX idx_user_name ON "user"(name);
Query
SELECT age FROM "user" WHERE name = 'Rahim';
- Table size: 10 million rows
- Index type: B‑Tree
nameis not uniqueageis not part of the index
MySQL execution
Important InnoDB rule
Secondary indexes store the PRIMARY KEY, not the physical row location.
idx_user_name entries look like: (name, primary_key_id).
Step‑by‑step
-
Traverse secondary index (name)
MySQL searches the B‑Tree for'Rahim'.
Cost:O(log N)Example leaf entry:
('Rahim', id=73482) -
Traverse PRIMARY KEY index (clustered index)
In InnoDB the primary key index is the table; rows are stored in PK order.
MySQL usesid = 73482to search the PK B‑Tree.
Cost:O(log N)Leaf entry returns the full row:
(id=73482, name='Rahim', age=29)
Summary
- ✅ Clustered index gives good locality
- ❌ Requires two B‑Tree traversals
- ❌ PK lookup cost grows with table size
PostgreSQL execution
Important PostgreSQL rule
Indexes store a TID (tuple ID) – a pointer to the heap location.
idx_user_name entries look like: (name, (block_id, offset)).
Step‑by‑step
-
Traverse B‑Tree index (name)
PostgreSQL searches the index for'Rahim'.
Cost:O(log N)Leaf entry:
('Rahim', TID=(block=102345, offset=7)) -
Heap fetch (direct pointer)
PostgreSQL goes directly to heap page 102345 and reads the row at offset 7.
Cost:O(1)(conceptually)Row returned:
(id=73482, name='Rahim', age=29)
Summary
- ✅ Only one B‑Tree traversal
- ✅ Heap fetch is constant‑time
- ❌ Heap pages may be scattered (less locality)
- ❌ Additional visibility checks due to MVCC
Performance considerations (Big‑O is not the full story)
When MySQL can be faster
- Primary key is small
- Data fits in the buffer pool
- Rows are accessed sequentially
- Heavy read workloads (OLTP)
➡️ Clustered index provides better locality, making MySQL win in these cases.
When PostgreSQL can be faster
- Very large tables
- Many secondary indexes
- Random access patterns
- Index‑only scans are possible
➡️ Pointer‑based access and constant‑time heap fetch give PostgreSQL an advantage.
Index‑only scan example
If you change the index to include age:
CREATE INDEX idx_user_name_age ON "user"(name, age);
Running the same query:
SELECT age FROM "user" WHERE name = 'Rahim';
- PostgreSQL can return the result without touching the heap (index‑only scan).
- MySQL cannot perform a true index‑only scan in the same way because its secondary indexes do not contain the needed column.
Comparative summary
| Aspect | MySQL | PostgreSQL |
|---|---|---|
| Index lookup cost | O(log N) + O(log N) (secondary + PK) | O(log N) + O(1) (secondary + direct heap) |
| Index‑only scan | Not supported natively | Supported when all needed columns are in the index |
| Small / medium datasets | Often feels faster | Comparable |
| Large datasets & many indexes | Scaling can degrade | Scales better |
| Analytics / complex queries | Less optimal | Generally superior |
| Simple OLTP workloads | Excellent | Good, but may have extra overhead |
Conclusion
- MySQL follows the “find index → find PK → find row” path, which is efficient for small to medium, sequential workloads.
- PostgreSQL’s pointer‑based approach (
O(log N) + O(1)) provides better scalability for large, random‑access workloads and enables true index‑only scans.