Internal Datatypes and Record Format in SQLite

Published: 3 days ago (February 27, 2026 at 01:20 PM EST)

5 min read

Source: Dev.to

git‑lrc: A Git Hook for Checking AI‑Generated Code

Hello, I’m Maneshwar. I’m working on git‑lrc, a Git hook that reviews code produced by AI agents before it lands in your repository.

Inside the SQLite Virtual Machine (VM)

So far, we’ve seen how the VM executes instructions, manages cursors, and moves through bytecode programs.
Today we zoom into something even more fundamental:

What exactly is a value inside the VM?

Before a value is written to disk, compared in a WHERE clause, or returned through sqlite3_step(), it exists inside the VM as a memory object. Understanding that representation is key to understanding SQLite’s flexibility.

The VM uses an arbitrary number of numbered memory locations (registers).
Each register holds exactly one value at a time.

Every value processed by the VM must belong to one of five primitive storage types:

Type	Description
INTEGER	Signed integer
REAL	Signed floating‑point number (IEEE)
TEXT	Character string (with encoding meta)
BLOB	Raw byte image
NULL	SQL NULL

There are no other primitive types inside SQLite’s execution engine. Everything—whether it originated from disk, an expression, or a bound parameter—becomes one of these five.

Physical Representation

Storage class	Physical representation
INTEGER	Stored in integer form
REAL	Stored in IEEE floating‑point format
TEXT	Stored with encoding metadata
BLOB	Raw bytes
NULL	Special marker

Cached Alternate Representations

Some values can have multiple representations cached simultaneously.
Example: the value 123 may exist as:

INTEGER 123
REAL 123.0
TEXT "123"

The VM keeps these representations synchronized when necessary, avoiding repeated conversions during expression evaluation.

BLOB and NULL cannot have alternate representations; they remain as they are.

If a conversion is needed (e.g., comparing TEXT with INTEGER), the VM performs implicit type conversion on demand. SQLite’s dynamic typing model relies heavily on this flexibility.

Inspecting Storage Types

SELECT a, typeof(a) FROM t1;

At the C‑API level, sqlite3_column_type() reveals the storage class of a value returned by sqlite3_step().

The `Mem` Object

Internally, almost every value inside the VM is represented as a Mem object. Each element of the Vdbe.aMem array is one such structure.

Conceptual layout of a Mem object

The actual value
Flags describing its current storage type
Metadata about encoding
Cached alternate representations

Invariant: A value always has exactly one canonical storage class, even if multiple representations are cached. The storage‑class flag indicates which representation is canonical.

This distinction between storage class and cached representation enables:

Manifest typing
On‑the‑fly type coercion
Efficient comparison semantics

—all without rigid schema enforcement at runtime.

Records vs. Registers

Individual values in registers are not what get stored directly in B‑trees. The VM first composes them into records—logically contiguous byte strings that contain:

A key
An optional payload (value portion)

Although the tree module may physically split a record across pages or overflow areas, the VM sees it as one continuous byte sequence. The VM alone is responsible for formatting records; the tree module merely stores and retrieves byte strings.

Two conceptual ways to format records

Approach	Characteristics
Fixed‑length	Each column occupies a predetermined amount of space; record size is constant; padding may be required; column sizes are known at table creation time.
Variable‑length	Column sizes vary per record; no padding; record size depends on actual data; offsets must be computed dynamically.

SQLite uses a variant of variable‑length formatting because it provides:

Smaller database files (no wasted padding)
Faster I/O (fewer bytes moved)
Support for dynamic, manifest typing

This is crucial: SQLite allows an INTEGER column to store TEXT, a TEXT column to store numbers, or any column to store NULL. Variable‑length formatting makes this possible without structural constraints.

Manifest Typing

Traditional databases enforce type at the schema level. SQLite enforces type at the value level—manifest typing. Each value carries its own storage class, independent of the column declaration. That’s why the Mem object stores type flags with the value itself.

Examples

INSERT INTO t1(c1) VALUES (123);   -- stores an INTEGER
INSERT INTO t1(c1) VALUES ('abc'); -- stores a TEXT

Both rows can coexist even if c1 was declared as TEXT.

In the next sections we will explore how this manifest‑typing model interacts with:

Record headers
Serial types
On‑disk encoding

Understanding SQLite’s byte‑level record encoding will make the design of the VM, the Mem object, and the B‑tree layer click into place.

About git‑lrc

AI agents write code fast, but they can silently:

Remove logic
Change behavior
Introduce bugs

Often you discover these issues only in production.

git‑lrc fixes this. It hooks into git commit and reviews every diff before it lands.

Free – unlimited AI code reviews that run on each commit
Easy – 60‑second setup
Open source – source‑available, ready for anyone to use

See It In Action

Detect serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements.

Watch demo video (git‑lrc‑intro‑60s.mp4)

Why Use git‑lrc?

🤖	AI agents silently break things – code removed, logic changed, edge cases gone. You won’t notice until production.
🔍	Catch it before it ships.

Any feedback or contributions are welcome!

Repository: git‑lrc – free, unlimited AI code reviews that run on commit.

fore it ships. AI-powered inline comments show you exactly what changed and what looks wrong.

🔁 **Build a habit, ship better code.**  
Regular review → fewer bugs → more robust code → better results in your team.

🔗 **Why git?**  
Git is universal. Every editor, every IDE, every AI…

[View on GitHub](#)

Internal Datatypes and Record Format in SQLite

git‑lrc: A Git Hook for Checking AI‑Generated Code

Inside the SQLite Virtual Machine (VM)

What exactly is a value inside the VM?

Physical Representation

Cached Alternate Representations

Inspecting Storage Types

The `Mem` Object

Records vs. Registers

Two conceptual ways to format records

Manifest Typing

About git‑lrc

See It In Action

Why Use git‑lrc?

Related posts

SQL Joins & Window Functions

SQL JOINS AND WINDOW FUNCTIONS

Mastering SQL Joins and Window Functions

Mastering SQL: A Guide to Joins and Window Functions

git‑lrc: A Git Hook for Checking AI‑Generated Code

Inside the SQLite Virtual Machine (VM)

What exactly is a value inside the VM?

Physical Representation

Cached Alternate Representations

Inspecting Storage Types

The Mem Object

Records vs. Registers

Two conceptual ways to format records

Manifest Typing

About git‑lrc

See It In Action

Why Use git‑lrc?

Related posts

SQL Joins & Window Functions

SQL JOINS AND WINDOW FUNCTIONS

Mastering SQL Joins and Window Functions

Mastering SQL: A Guide to Joins and Window Functions

The `Mem` Object