Canonical JSON Model

Published: (December 16, 2025 at 05:49 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Purpose

The Canonical JSON Model defines the stable, deterministic representation of AI execution state produced by a FACET‑compliant system. Its goal is to ensure that:

  • identical inputs produce byte‑for‑byte identical JSON
  • outputs are comparable, cacheable, diffable, and replayable
  • provider‑specific formats do not leak nondeterminism into downstream systems

Canonical JSON is the boundary artifact between deterministic compilation and probabilistic model execution.

Why deterministic JSON matters

Modern LLM stacks suffer from hidden nondeterminism:

Source of nondeterminismEffect
Field reordering in JSON objectsCaching unreliable
Optional fields appearing/disappearingReplay impossible
Provider‑specific message layoutsRegression testing meaningless
Streaming vs. non‑streaming structural driftAuditing and compliance fragile
Implicit defaults applied at runtime

These effects make:

  • caching unreliable
  • replay impossible
  • regression testing meaningless
  • auditing and compliance fragile

Canonical JSON eliminates these failure modes by enforcing a single normalized shape.

Canonical JSON Document

A Canonical JSON Document is a JSON object that satisfies all of the following:

  1. Deterministic field ordering
  2. Explicit presence or absence of all optional fields
  3. Stable numeric and string encoding
  4. Provider‑agnostic structure
  5. Fully derived from a typed execution state

FACET treats Canonical JSON as a compiled artifact, not a serialization convenience.

Top‑level ordering (normative)

meta
system
tools
examples
history
user
assistant
output

All compliant implementations MUST preserve this order.

Nested objects follow:

  • lexical key ordering (UTF‑8, code‑point order)
  • stable list ordering derived from execution order or explicit keys

Canonical JSON forbids implicit defaults and structural ambiguity.

Core Rules

RuleRequirement
Optional FieldsFields defined in the schema but missing from the runtime value MUST be rendered as null. Omission of known fields is PROHIBITED.
Empty ListsRendered as [].
Empty ObjectsRendered as {}.
BooleansAlways explicit (true / false).
IntegersRendered without leading zeros.
FloatsNormalized decimal form (no exponent unless required).
StringsUTF‑8, NFC‑normalized.
EscapingFollows the JSON standard; no alternative encodings.
Disallowed valuesNaN, Infinity, locale‑dependent number formats.

Rationale – Explicit null guarantees that the JSON keyset remains constant regardless of data content, enabling O(1) shape verification and stable hashing across languages with different default serialization behaviours (e.g., JavaScript vs. Rust).

Production Pipeline

Input:  [.facet] → [AST] → [R‑DAG] → [Token Box]

Core:              [[ CANONICAL JSON IR ]] ─┤

Views:       ┌──────────────┼───────────────┐
             ▼              ▼               ▼
          [OpenAI]     [Anthropic]      [Gemini]
             │              │               │
Output:      └──────────────┼───────────────┘

                       [ API Call ]
  • Canonical JSON is the single source of truth.
  • Provider payloads (OpenAI, Anthropic, Gemini, etc.) are derived views of Canonical JSON, not sources of truth.

FACET enforces the rule:

All provider payloads are ephemeral. Canonical JSON is permanent.

Consequences

  • Switching providers does not invalidate history.
  • Stored executions remain replayable even if a vendor API changes.
  • Audits and compliance reports are immune to provider schema drift.
  • Bugs in a provider adapter cannot corrupt the core execution record.

In practice:

  • Canonical JSON is stored, hashed, diffed, and cached.
  • Provider payloads are generated just‑in‑time and discarded.

Thus vendor lock‑in is structurally impossible at the execution layer. If a provider:

  • rejects a payload
  • enforces undocumented constraints
  • changes streaming semantics

the failure is isolated to the adapter layer; Canonical JSON remains valid, stable, and reusable.

@test / Snapshot Testing

Canonical JSON enables true snapshot testing for AI systems because it is:

  • byte‑for‑byte deterministic
  • provider‑agnostic
  • fully explicit in structure

Typical test flow

  1. Run the full execution pipeline in Pure Mode.
  2. Produce Canonical JSON.
  3. Hash and/or store the JSON as a snapshot.
  4. Future runs compare against this snapshot.
@test "payment flow"
  vars:
    amount: 100
    currency: "USD"

  assert:
    - canonical_json_hash == "b3e2…"

Guarantees

  • Logic changes are immediately visible.
  • Provider drift cannot invalidate tests.
  • Regressions are caught before deployment.

For enterprise systems this enables:

  • deterministic CI pipelines
  • audit‑safe execution logs
  • reproducible incident analysis
  • long‑term caching with cryptographic guarantees

If all of the following are true:

  • same FACET document
  • same inputs
  • same execution mode (Pure)
  • same lens registry
    Then:
  • Canonical JSON MUST be identical
  • Hash(canonical_json) MUST be identical
  • Downstream behavior MUST be reproducible

This is the foundation for:

  • memoization
  • snapshot testing
  • deterministic agents

Comparison: Ad‑hoc JSON vs. Canonical JSON

PropertyAd‑hoc JSONCanonical JSON
Field orderUnstableDeterministic
Optional fieldsImplicitExplicit (null)
Provider leakageHighNone
Diff‑friendlyNoYes
Cache‑safeNoYes

Canonical JSON turns AI behavior into versioned, testable artifacts—not ephemeral model outputs.

Replayability

ReplayableNo
Yes

JSON is not a data format.

semantic boundary.

Canonical JSON turns that boundary into something that can be reasoned about, tested, and trusted.
FACET Canonical JSON plays the same role in AI systems that LLVM IR plays in compilers.

Compiler Stack

FACET Stack
LayerDescription
Source Code.facet document
ASTTyped FACET AST
LLVM IRCanonical JSON
Target BackendProvider Adapter (OpenAI / Anthropic / Gemini)
Machine CodeProvider Payload

Key properties shared with LLVM IR

  • Provider‑independent representation
  • Deterministic and stable shape
  • Diffable and inspectable
  • Safe target for optimization, caching, and replay

Just as LLVM allows one program to target x86, ARM, or WebAssembly without changing source code, FACET allows one agent architecture to target multiple LLM providers without changing execution semantics.

This is why Canonical JSON is treated as an Intermediate Representation, not a serialization detail. Once this layer exists, provider payloads become replaceable implementation details.

Normative Canonical JSON Model

This document defines the normative Canonical JSON Model for FACET v2.0 and later.

All compliant implementations MUST follow these rules when producing canonical execution output.

Back to Blog

Related posts

Read more »