Binary Search Trees: Why They’re Great in Memory but Terrible on Disk

Published: 1 day ago (February 5, 2026 at 05:05 PM EST)

2 min read

Source: Dev.to

Binary search trees (BST) are in‑memory sorted data structures for efficient lookups. Each node has two children; for any node, all values in the left subtree are less than the node’s value, and all values in the right subtree are greater.

Binary Search Tree

This structure enables O(log N) search because a balanced BST has height O(log N). For the example tree with four elements, the worst‑case traversal is log₂4 = 2 steps.

Tree Balancing

If we insert the numbers [1, 2, 3] without rebalancing, the tree becomes a degenerate list (Fig A), leading to O(N) search time. A BST is considered unbalanced when the height difference between its two subtrees exceeds 1. Rebalancing can be performed with a rotation pivot, producing a more balanced shape (Fig B).

Rebalancing of BST

Why It Sucks as Disk‑Based Storage

1. Random Insertion Order

When a new node is added, it is allocated wherever free disk space exists, and the parent stores a pointer to that location. Consequently, a child node may reside far from its parent, destroying locality.

Operating systems load data in blocks (e.g., 16 KB). A contiguous block contains the target data and neighboring data, so if the next needed item is nearby, it is already in memory. With BST nodes scattered across the disk, this benefit is lost; each child may require a separate block load.

2. Fan‑out of 2

A binary node’s fan‑out of 2 means that as the number of elements grows, the tree height grows logarithmically. For one million elements, the height is roughly 20, requiring about 20 traversals. Because each node may be on a different disk page, every traversal can trigger a separate disk seek and page transfer.

Additionally, rebalancing during insertions changes the tree structure, preventing effective caching of paths.

Conclusion

BSTs excel in memory: O(log N) searches with simple pointer chasing work well when all data resides in RAM. On disk, however, the same properties become liabilities:

Random allocation eliminates spatial locality.
Binary fan‑out yields tall trees, causing many costly disk seeks.
Frequent rebalancing invalidates cached navigation paths.

Therefore, disk‑based databases favor B‑trees and B⁺‑trees, which have high fan‑out (hundreds of children per node) and node sizes that match disk pages. A B‑tree storing one million elements has a height of about 3, reducing the number of required disk reads from ~20 to ~3. This illustrates how a data structure optimal for one environment can be unsuitable for another.

Binary Search Trees: Why They’re Great in Memory but Terrible on Disk

Tree Balancing

Why It Sucks as Disk‑Based Storage

1. Random Insertion Order

2. Fan‑out of 2

Conclusion

Related posts

Why “Easy / Medium / Hard” Is a Terrible Way to Think About DSA

How to Prepare for Tech Interviews: The Complete 2026 Guide

I Implemented Every Sorting Algorithm in Python — And Python's Built-in Sort Crushed Them All

How I Identify the Right DSA Pattern (Before Writing Any Code)

Tree Balancing

Why It Sucks as Disk‑Based Storage

1. Random Insertion Order

2. Fan‑out of 2

Conclusion

Related posts

Why “Easy / Medium / Hard” Is a Terrible Way to Think About DSA

How to Prepare for Tech Interviews: The Complete 2026 Guide

I Implemented Every Sorting Algorithm in Python — And Python's Built-in Sort Crushed Them All

How I Identify the Right DSA Pattern (Before Writing Any Code)

2. Fan‑out of 2