Internal Datatypes and Record Format in SQLite
Source: Dev.to
git‑lrc: A Git Hook for Checking AI‑Generated Code
Hello, I’m Maneshwar. I’m working on git‑lrc, a Git hook that reviews code produced by AI agents before it lands in your repository.
Inside the SQLite Virtual Machine (VM)
So far, we’ve seen how the VM executes instructions, manages cursors, and moves through bytecode programs.
Today we zoom into something even more fundamental:
What exactly is a value inside the VM?
Before a value is written to disk, compared in a WHERE clause, or returned through sqlite3_step(), it exists inside the VM as a memory object. Understanding that representation is key to understanding SQLite’s flexibility.
- The VM uses an arbitrary number of numbered memory locations (registers).
- Each register holds exactly one value at a time.
Every value processed by the VM must belong to one of five primitive storage types:
| Type | Description |
|---|---|
| INTEGER | Signed integer |
| REAL | Signed floating‑point number (IEEE) |
| TEXT | Character string (with encoding meta) |
| BLOB | Raw byte image |
| NULL | SQL NULL |
There are no other primitive types inside SQLite’s execution engine. Everything—whether it originated from disk, an expression, or a bound parameter—becomes one of these five.
Physical Representation
| Storage class | Physical representation |
|---|---|
| INTEGER | Stored in integer form |
| REAL | Stored in IEEE floating‑point format |
| TEXT | Stored with encoding metadata |
| BLOB | Raw bytes |
| NULL | Special marker |
Cached Alternate Representations
Some values can have multiple representations cached simultaneously.
Example: the value 123 may exist as:
INTEGER 123REAL 123.0TEXT "123"
The VM keeps these representations synchronized when necessary, avoiding repeated conversions during expression evaluation.
BLOB and NULL cannot have alternate representations; they remain as they are.
If a conversion is needed (e.g., comparing TEXT with INTEGER), the VM performs implicit type conversion on demand. SQLite’s dynamic typing model relies heavily on this flexibility.
Inspecting Storage Types
SELECT a, typeof(a) FROM t1;
At the C‑API level, sqlite3_column_type() reveals the storage class of a value returned by sqlite3_step().
The Mem Object
Internally, almost every value inside the VM is represented as a Mem object. Each element of the Vdbe.aMem array is one such structure.
Conceptual layout of a Mem object
- The actual value
- Flags describing its current storage type
- Metadata about encoding
- Cached alternate representations
Invariant: A value always has exactly one canonical storage class, even if multiple representations are cached. The storage‑class flag indicates which representation is canonical.
This distinction between storage class and cached representation enables:
- Manifest typing
- On‑the‑fly type coercion
- Efficient comparison semantics
—all without rigid schema enforcement at runtime.
Records vs. Registers
Individual values in registers are not what get stored directly in B‑trees. The VM first composes them into records—logically contiguous byte strings that contain:
- A key
- An optional payload (value portion)
Although the tree module may physically split a record across pages or overflow areas, the VM sees it as one continuous byte sequence. The VM alone is responsible for formatting records; the tree module merely stores and retrieves byte strings.
Two conceptual ways to format records
| Approach | Characteristics |
|---|---|
| Fixed‑length | Each column occupies a predetermined amount of space; record size is constant; padding may be required; column sizes are known at table creation time. |
| Variable‑length | Column sizes vary per record; no padding; record size depends on actual data; offsets must be computed dynamically. |
SQLite uses a variant of variable‑length formatting because it provides:
- Smaller database files (no wasted padding)
- Faster I/O (fewer bytes moved)
- Support for dynamic, manifest typing
This is crucial: SQLite allows an INTEGER column to store TEXT, a TEXT column to store numbers, or any column to store NULL. Variable‑length formatting makes this possible without structural constraints.
Manifest Typing
Traditional databases enforce type at the schema level. SQLite enforces type at the value level—manifest typing. Each value carries its own storage class, independent of the column declaration. That’s why the Mem object stores type flags with the value itself.
Examples
INSERT INTO t1(c1) VALUES (123); -- stores an INTEGER
INSERT INTO t1(c1) VALUES ('abc'); -- stores a TEXT
Both rows can coexist even if c1 was declared as TEXT.
In the next sections we will explore how this manifest‑typing model interacts with:
- Record headers
- Serial types
- On‑disk encoding
Understanding SQLite’s byte‑level record encoding will make the design of the VM, the Mem object, and the B‑tree layer click into place.
About git‑lrc
AI agents write code fast, but they can silently:
- Remove logic
- Change behavior
- Introduce bugs
Often you discover these issues only in production.
git‑lrc fixes this. It hooks into git commit and reviews every diff before it lands.
- Free – unlimited AI code reviews that run on each commit
- Easy – 60‑second setup
- Open source – source‑available, ready for anyone to use
See It In Action
Detect serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements.
Watch demo video (git‑lrc‑intro‑60s.mp4)
Why Use git‑lrc?
| 🤖 | AI agents silently break things – code removed, logic changed, edge cases gone. You won’t notice until production. |
|---|---|
| 🔍 | Catch it before it ships. |
Any feedback or contributions are welcome!
Repository: git‑lrc – free, unlimited AI code reviews that run on commit.
fore it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
🔁 **Build a habit, ship better code.**
Regular review → fewer bugs → more robust code → better results in your team.
🔗 **Why git?**
Git is universal. Every editor, every IDE, every AI…
[View on GitHub](#)