Inside SQLite Backend: Virtual Machine, Storage, and the Build Process
Source: Dev.to
Virtual Machine (VDBE)
Once the frontend finishes compilation, it hands over a bytecode program to the Virtual Database Engine (VDBE).
A bytecode program is:
- A linear sequence of instructions
- Each instruction has an opcode and up to five operands
- Executed sequentially, one instruction at a time
The VM behaves like a custom CPU, designed specifically for database operations such as scanning tables, comparing values, managing cursors, and enforcing transactional semantics.
Tree Module (B‑tree Storage)
SQLite stores data using tree structures:
- Tables → B+ trees
- Indexes → B‑trees
Each table and index has its own independent tree structure. The implementation resides in:
btree.c– tree logicbtree.h– public interface
The tree module supports searching, insertion, deletion, updates, and structural changes (e.g., creating or dropping tables and indexes).
Pager
The pager is a critical component that mediates all file I/O. The tree module never accesses the database file directly; instead, it works with fixed‑size pages requested from the pager.
Key responsibilities of the pager:
- Reads and writes database pages
- Maintains an in‑memory page cache
- Handles file locking
- Manages rollback journals
- Enforces transaction boundaries
In effect, the pager acts as a data manager, lock manager, log manager, and transaction manager. Its source files are:
pager.cpager.h
The pager enables SQLite to deliver ACID guarantees using a single database file.
lovestaco@i3nux-mint:~/pers/sqlite$ ll /home/lovestaco/pers/sqlite/bld/sqlite3
-rwxrwxr-x 1 lovestaco lovestaco 6.9M Jan 11 17:14 /home/lovestaco/pers/sqlite/bld/sqlite3
Build Process
SQLite’s build process reflects its philosophy of self‑containment and reproducibility. It consists of six major steps:
- Generate
sqlite3.h - Build the SQL parser
- Generate VM opcodes
- Generate opcode names
- Generate SQL keyword tables
- Compile the library
During the build:
lemon.cgeneratesparse.candparse.hmkkeywordhash.cgenerateskeywordhash.hawkandsedgeneratesqlite3.h,opcodes.h, andopcodes.c
opcodes.h assigns numeric values to VM instructions, while opcodes.c maps opcodes to human‑readable names useful for debugging and diagnostics.
Modern releases provide a single amalgamation file, sqlite3.c, along with sqlite3.h. Advantages of using the amalgamation include:
- 5–10 % better performance
- More aggressive compiler optimizations
- Simplified build process
- Easier embedding into applications
The command‑line utility additionally requires shell.c.
Summary
- SQL is compiled into bytecode and executed by a purpose‑built VM.
- SQLite ensures serializable execution using database‑level locking.
- Journaling guarantees atomicity and recovery.
- Each database lives in a single native file anchored by
sqlite_master.
The architecture is modular, cleanly layered, and fully open source in the public domain.
Looking Ahead
The next chapter will dive deeper into database and journal file storage structures, revealing how SQLite’s on‑disk layout materializes these abstractions.
Further resources
- My SQLite experiments:
- FreeDevTools (open‑source hub for dev tools):
Reference: SQLite Database System: Design and Implementation. Sibsankar Haldar (n.d.).