How Real Databases Work Internally ?

Published: (January 31, 2026 at 05:28 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

What a Real Database Is Not

  • A giant table in memory
  • A collection of CSV files
  • A simple key‑value map

Core Components

A serious database consists of several tightly integrated subsystems:

  • Parser
  • Planner / Optimizer
  • Execution Engine
  • Storage Engine
  • Buffer Cache
  • Transaction Manager
  • Recovery System

Query Processing Pipeline

When you run a statement such as:

SELECT name FROM users WHERE age > 30;

the database does not immediately scan a table. It follows a strict pipeline:

Parsing and Validation

  1. Convert the SQL text into an Abstract Syntax Tree (AST).
  2. Check:
    • Syntax correctness
    • Valid table and column names
    • User permissions

At this stage no data is accessed.

Optimization

The query optimizer decides:

  • Which indexes to use
  • Join order
  • Scan method (index scan vs. sequential scan)
  • Cost estimation based on statistics

Typical decisions:

  • Is it cheaper to scan 1 million rows sequentially?
  • Or to use an index with random I/O?

Modern databases employ:

  • Cost‑based optimizers
  • Statistics (histograms, cardinality)
  • Rule‑based rewrites

This step often determines performance more than the hardware.

Execution

The optimizer produces an execution plan. The execution engine then:

  • Pulls rows through operators (scan → filter → project)
  • Uses iterators or vectorized execution
  • Streams results instead of loading everything into memory

Important concept: Databases process data in pipelines, not all at once.

Storage Layout

Pages

Data is stored in fixed‑size pages (commonly 4 KB–16 KB).
Each page contains:

  • Page header
  • Row slots
  • Metadata

Pages are the minimum unit of I/O.

Row‑Oriented vs. Column‑Oriented Stores

Store TypeTypical UseCharacteristics
Row‑oriented (PostgreSQL, MySQL)OLTPFast inserts and point queries
Column‑oriented (ClickHouse, Redshift)AnalyticsExcellent compression, vectorized scans

Indexes

Indexes are separate data structures, most commonly:

  • B‑Trees – keep data sorted, minimize disk seeks, balance read/write costs
  • Hash indexes
  • LSM Trees (RocksDB, Cassandra)

Important truth: Indexes speed up reads but slow down writes, because every insert/update must modify both the table data and all related indexes.

Buffer Cache and Durability

Databases never trust memory alone. Frequently used pages are cached in RAM, while dirty pages are written back later using replacement strategies (LRU variants).

If power fails, memory is lost, so the disk must remain consistent.

ACID Guarantees

Real databases guarantee Atomicity, Consistency, Isolation, Durability through a Write‑Ahead Log (WAL):

  1. Before modifying data, changes are written to the log.
  2. The log is flushed to disk.
  3. Only then is the in‑memory page updated.

If the database crashes, the WAL is replayed to recover data. Logs are often more critical than the data files themselves.

Concurrency Control

Databases support thousands of concurrent users using:

  • Locks (row, page, table)
  • MVCC (Multi‑Version Concurrency Control)

With MVCC:

  • Readers don’t block writers.
  • Writers create new versions.
  • Old versions are cleaned by vacuum/GC.

This is why systems like PostgreSQL can read without locking and scale well under load.

Recovery Process

When a database restarts after a crash:

  1. Read the last checkpoint.
  2. Replay WAL records.
  3. Undo incomplete transactions.
  4. Restore consistency.

The process is deterministic and repeatable—no guessing or heuristics.

Why Understanding Internals Matters

  • Write faster queries
  • Design better schemas
  • Choose the correct indexes
  • Avoid dangerous assumptions
  • Debug performance issues

Most “slow database” problems stem from:

  • Bad query plans
  • Wrong indexes
  • Misunderstanding internals

…not from hardware.

Conclusion

A real database is closer to an operating system than a simple library. It manages memory, storage, concurrency, recovery, and scheduling. Treating it as a black box will punish you; understanding its internals turns it into one of the most powerful tools in software engineering.

Back to Blog

Related posts

Read more »

Switch on RUST

My path from Java to Rust: changing the technology stack Hello, my name is Garik, and today I want to share with you the story of how I decided to change the t...