Binary Search Trees: Why They’re Great in Memory but Terrible on Disk
Source: Dev.to
Binary search trees (BST) are in‑memory sorted data structures for efficient lookups. Each node has two children; for any node, all values in the left subtree are less than the node’s value, and all values in the right subtree are greater.

This structure enables O(log N) search because a balanced BST has height O(log N). For the example tree with four elements, the worst‑case traversal is log₂4 = 2 steps.
Tree Balancing
If we insert the numbers [1, 2, 3] without rebalancing, the tree becomes a degenerate list (Fig A), leading to O(N) search time. A BST is considered unbalanced when the height difference between its two subtrees exceeds 1. Rebalancing can be performed with a rotation pivot, producing a more balanced shape (Fig B).

Why It Sucks as Disk‑Based Storage
1. Random Insertion Order
When a new node is added, it is allocated wherever free disk space exists, and the parent stores a pointer to that location. Consequently, a child node may reside far from its parent, destroying locality.
Operating systems load data in blocks (e.g., 16 KB). A contiguous block contains the target data and neighboring data, so if the next needed item is nearby, it is already in memory. With BST nodes scattered across the disk, this benefit is lost; each child may require a separate block load.
2. Fan‑out of 2
A binary node’s fan‑out of 2 means that as the number of elements grows, the tree height grows logarithmically. For one million elements, the height is roughly 20, requiring about 20 traversals. Because each node may be on a different disk page, every traversal can trigger a separate disk seek and page transfer.
Additionally, rebalancing during insertions changes the tree structure, preventing effective caching of paths.
Conclusion
BSTs excel in memory: O(log N) searches with simple pointer chasing work well when all data resides in RAM. On disk, however, the same properties become liabilities:
- Random allocation eliminates spatial locality.
- Binary fan‑out yields tall trees, causing many costly disk seeks.
- Frequent rebalancing invalidates cached navigation paths.
Therefore, disk‑based databases favor B‑trees and B⁺‑trees, which have high fan‑out (hundreds of children per node) and node sizes that match disk pages. A B‑tree storing one million elements has a height of about 3, reducing the number of required disk reads from ~20 to ~3. This illustrates how a data structure optimal for one environment can be unsuitable for another.