Building a B-tree in Go: The gap between theory and implementation

Published: (January 14, 2026 at 05:33 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

Why a B‑Tree?

  • Most database indexes are B‑trees (or B+‑trees).
  • Unlike binary trees (2 children per node), a B‑tree node can have many children, determined by the order.
  • This multi‑way branching makes B‑trees ideal for disk storage where each block read is expensive.

I won’t re‑explain B‑trees here – chapter 2 of Database Internals does that brilliantly. This article assumes you already know the basics and focuses on what I learned while building one.

Implementation Highlights

Core Operations

  • Binary search inside nodes → recurse to the appropriate child.
  • Node split – keep the tree balanced: find the median, split into two halves, promote the median key.
  • Root split – special case: create a brand‑new root containing the promoted key, making the tree taller rather than wider.

Gotchas

IssueWhat I learned
Split cascadeInserting a key can trigger a cascade of splits up the tree. You must update every parent‑child pointer correctly.
Internal node splitNot just moving one or two keys – you must rewire the entire sub‑graph of children and fix their parent pointers.
B‑tree vs B+‑treeB‑trees store data in every node; B+‑trees store data only in leaves (with linked leaves for fast range scans). My implementation stopped the search as soon as the key was found, which is elegant but makes range queries expensive.
Skewed insertsInserting sequential keys (1, 2, 3, …) produced a 7‑level deep tree with only 50 keys – a pathological case that highlighted the need for bulk‑load strategies and REINDEX.
Random insertsProduced a nicely balanced 3‑level tree, showing the natural self‑balancing property of B‑trees.

Test‑Driven Development (TDD)

I wrote 20+ unit tests (100 % coverage) before any implementation code. The tests forced me to think through edge cases and defined “done” up front.

Bugs Caught by Tests

  • Duplicate‑key insertion (forgot to check for existing keys).
  • Off‑by‑one errors on index bounds.
  • Children‑count mismatch after splits (n keys → n+1 children).
  • Incorrect parent pointers after a split cascade.

When all tests finally turned green, the confidence was immediate – no crossed fingers, just certainty.

Stats

MetricValue
Time spent~15 hours across 4 days
Code size~200 lines of Go
Tests20+ unit tests, 100 % coverage
Bugs foundToo many to count (all caught by TDD)
Coffee consumedSignificant ☕️

Reflections

  • Hands‑on learning beats passive reading. Seeing the tree’s pointers change in memory made the abstract concept concrete.
  • TDD isn’t just a best practice – it’s a thinking tool. Writing the test first forced me to articulate the exact behavior I wanted.
  • B+‑trees vs B‑trees – now I understand why PostgreSQL uses B+‑trees for indexes (better range scans) while still appreciating the elegance of a pure B‑tree.

What’s Next?

I’m planning to dive into LSM‑trees (Log‑Structured Merge trees), which flip the trade‑off:

  • B‑trees → optimize for reads (in‑place updates).
  • LSM‑trees → optimize for writes (append‑only).

Implementing compaction strategies will be the next level of fun. 🤩

TL;DR

If you’re curious about database internals:

  1. Read Database Internals by Alex Petrov.
  2. Pick a data structure (B‑tree, LSM‑tree, etc.) and build it.
  3. Use TDD to lock down edge cases early.

The gap between reading and implementing is where the real magic happens.

You can see the full implementation on my self‑hosted GitLab (yes, I run my own instance: infra nerd life). Pull requests welcome if you spot ways to improve it! 😊

Back to Blog

Related posts

Read more »

Day 0: Starting My DSA Journey

Day 0 – Planning, Mindset & Commitment - Goal: Become strong in problem‑solving & interview‑ready - Starting level: Beginner / Revising basics - Daily commitme...