Canonical JSON Model

Published: 2 days ago (December 16, 2025 at 05:49 PM EST)

4 min read

Source: Dev.to

Purpose

The Canonical JSON Model defines the stable, deterministic representation of AI execution state produced by a FACET‑compliant system. Its goal is to ensure that:

identical inputs produce byte‑for‑byte identical JSON
outputs are comparable, cacheable, diffable, and replayable
provider‑specific formats do not leak nondeterminism into downstream systems

Canonical JSON is the boundary artifact between deterministic compilation and probabilistic model execution.

Why deterministic JSON matters

Modern LLM stacks suffer from hidden nondeterminism:

Source of nondeterminism	Effect
Field reordering in JSON objects	Caching unreliable
Optional fields appearing/disappearing	Replay impossible
Provider‑specific message layouts	Regression testing meaningless
Streaming vs. non‑streaming structural drift	Auditing and compliance fragile
Implicit defaults applied at runtime	–

These effects make:

caching unreliable
replay impossible
regression testing meaningless
auditing and compliance fragile

Canonical JSON eliminates these failure modes by enforcing a single normalized shape.

Canonical JSON Document

A Canonical JSON Document is a JSON object that satisfies all of the following:

Deterministic field ordering
Explicit presence or absence of all optional fields
Stable numeric and string encoding
Provider‑agnostic structure
Fully derived from a typed execution state

FACET treats Canonical JSON as a compiled artifact, not a serialization convenience.

Top‑level ordering (normative)

meta
system
tools
examples
history
user
assistant
output

All compliant implementations MUST preserve this order.

Nested objects follow:

lexical key ordering (UTF‑8, code‑point order)
stable list ordering derived from execution order or explicit keys

Canonical JSON forbids implicit defaults and structural ambiguity.

Core Rules

Rule	Requirement
Optional Fields	Fields defined in the schema but missing from the runtime value MUST be rendered as `null`. Omission of known fields is PROHIBITED.
Empty Lists	Rendered as `[]`.
Empty Objects	Rendered as `{}`.
Booleans	Always explicit (`true` / `false`).
Integers	Rendered without leading zeros.
Floats	Normalized decimal form (no exponent unless required).
Strings	UTF‑8, NFC‑normalized.
Escaping	Follows the JSON standard; no alternative encodings.
Disallowed values	`NaN`, `Infinity`, locale‑dependent number formats.

Rationale – Explicit null guarantees that the JSON keyset remains constant regardless of data content, enabling O(1) shape verification and stable hashing across languages with different default serialization behaviours (e.g., JavaScript vs. Rust).

Production Pipeline

Input:  [.facet] → [AST] → [R‑DAG] → [Token Box]
                                            │
Core:              [[ CANONICAL JSON IR ]] ─┤
                                            │
Views:       ┌──────────────┼───────────────┐
             ▼              ▼               ▼
          [OpenAI]     [Anthropic]      [Gemini]
             │              │               │
Output:      └──────────────┼───────────────┘
                            ▼
                       [ API Call ]

Canonical JSON is the single source of truth.
Provider payloads (OpenAI, Anthropic, Gemini, etc.) are derived views of Canonical JSON, not sources of truth.

FACET enforces the rule:

All provider payloads are ephemeral. Canonical JSON is permanent.

Consequences

Switching providers does not invalidate history.
Stored executions remain replayable even if a vendor API changes.
Audits and compliance reports are immune to provider schema drift.
Bugs in a provider adapter cannot corrupt the core execution record.

In practice:

Canonical JSON is stored, hashed, diffed, and cached.
Provider payloads are generated just‑in‑time and discarded.

Thus vendor lock‑in is structurally impossible at the execution layer. If a provider:

rejects a payload
enforces undocumented constraints
changes streaming semantics

the failure is isolated to the adapter layer; Canonical JSON remains valid, stable, and reusable.

`@test` / Snapshot Testing

Canonical JSON enables true snapshot testing for AI systems because it is:

byte‑for‑byte deterministic
provider‑agnostic
fully explicit in structure

Typical test flow

Run the full execution pipeline in Pure Mode.
Produce Canonical JSON.
Hash and/or store the JSON as a snapshot.
Future runs compare against this snapshot.

@test "payment flow"
  vars:
    amount: 100
    currency: "USD"

  assert:
    - canonical_json_hash == "b3e2…"

Guarantees

Logic changes are immediately visible.
Provider drift cannot invalidate tests.
Regressions are caught before deployment.

For enterprise systems this enables:

deterministic CI pipelines
audit‑safe execution logs
reproducible incident analysis
long‑term caching with cryptographic guarantees

If all of the following are true:

same FACET document

same inputs

same execution mode (Pure)

same lens registry
Then:

Canonical JSON MUST be identical

Hash(canonical_json) MUST be identical

Downstream behavior MUST be reproducible

This is the foundation for:

memoization
snapshot testing
deterministic agents

Comparison: Ad‑hoc JSON vs. Canonical JSON

Property	Ad‑hoc JSON	Canonical JSON
Field order	Unstable	Deterministic
Optional fields	Implicit	Explicit (`null`)
Provider leakage	High	None
Diff‑friendly	No	Yes
Cache‑safe	No	Yes

Canonical JSON turns AI behavior into versioned, testable artifacts—not ephemeral model outputs.

Replayability


Replayable	No
Yes

JSON is not a data format.

semantic boundary.

Canonical JSON turns that boundary into something that can be reasoned about, tested, and trusted.
FACET Canonical JSON plays the same role in AI systems that LLVM IR plays in compilers.

Compiler Stack

FACET Stack

Layer	Description
Source Code	`.facet` document
AST	Typed FACET AST
LLVM IR	Canonical JSON
Target Backend	Provider Adapter (OpenAI / Anthropic / Gemini)
Machine Code	Provider Payload

Key properties shared with LLVM IR

Provider‑independent representation
Deterministic and stable shape
Diffable and inspectable
Safe target for optimization, caching, and replay

Just as LLVM allows one program to target x86, ARM, or WebAssembly without changing source code, FACET allows one agent architecture to target multiple LLM providers without changing execution semantics.

This is why Canonical JSON is treated as an Intermediate Representation, not a serialization detail. Once this layer exists, provider payloads become replaceable implementation details.

Normative Canonical JSON Model

This document defines the normative Canonical JSON Model for FACET v2.0 and later.

All compliant implementations MUST follow these rules when producing canonical execution output.

Canonical JSON Model

Purpose

Why deterministic JSON matters

Canonical JSON Document

Top‑level ordering (normative)

Core Rules

Production Pipeline

Consequences

`@test` / Snapshot Testing

Typical test flow

Guarantees

Comparison: Ad‑hoc JSON vs. Canonical JSON

Replayability

JSON is not a data format.

Compiler Stack

Key properties shared with LLVM IR

Normative Canonical JSON Model

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

Purpose

Why deterministic JSON matters

Canonical JSON Document

Top‑level ordering (normative)

Core Rules

Production Pipeline

Consequences

@test / Snapshot Testing

Typical test flow

Guarantees

Comparison: Ad‑hoc JSON vs. Canonical JSON

Replayability

JSON is not a data format.

Compiler Stack

Key properties shared with LLVM IR

Normative Canonical JSON Model

Related posts

We found our site was slow in Singapore but perfect in Europe — here's why

I put a Game Boy inside ChatGPT (ChatGPT Apps)

Advent of AI - Day 13: Goose Terminal Integration

A Day in the Life of a Marketing Manager Using Microsoft Planner

`@test` / Snapshot Testing

Key properties shared with LLVM IR