Canonical JSON Model
Source: Dev.to
Purpose
The Canonical JSON Model defines the stable, deterministic representation of AI execution state produced by a FACET‑compliant system. Its goal is to ensure that:
- identical inputs produce byte‑for‑byte identical JSON
- outputs are comparable, cacheable, diffable, and replayable
- provider‑specific formats do not leak nondeterminism into downstream systems
Canonical JSON is the boundary artifact between deterministic compilation and probabilistic model execution.
Why deterministic JSON matters
Modern LLM stacks suffer from hidden nondeterminism:
| Source of nondeterminism | Effect |
|---|---|
| Field reordering in JSON objects | Caching unreliable |
| Optional fields appearing/disappearing | Replay impossible |
| Provider‑specific message layouts | Regression testing meaningless |
| Streaming vs. non‑streaming structural drift | Auditing and compliance fragile |
| Implicit defaults applied at runtime | – |
These effects make:
- caching unreliable
- replay impossible
- regression testing meaningless
- auditing and compliance fragile
Canonical JSON eliminates these failure modes by enforcing a single normalized shape.
Canonical JSON Document
A Canonical JSON Document is a JSON object that satisfies all of the following:
- Deterministic field ordering
- Explicit presence or absence of all optional fields
- Stable numeric and string encoding
- Provider‑agnostic structure
- Fully derived from a typed execution state
FACET treats Canonical JSON as a compiled artifact, not a serialization convenience.
Top‑level ordering (normative)
meta
system
tools
examples
history
user
assistant
output
All compliant implementations MUST preserve this order.
Nested objects follow:
- lexical key ordering (UTF‑8, code‑point order)
- stable list ordering derived from execution order or explicit keys
Canonical JSON forbids implicit defaults and structural ambiguity.
Core Rules
| Rule | Requirement |
|---|---|
| Optional Fields | Fields defined in the schema but missing from the runtime value MUST be rendered as null. Omission of known fields is PROHIBITED. |
| Empty Lists | Rendered as []. |
| Empty Objects | Rendered as {}. |
| Booleans | Always explicit (true / false). |
| Integers | Rendered without leading zeros. |
| Floats | Normalized decimal form (no exponent unless required). |
| Strings | UTF‑8, NFC‑normalized. |
| Escaping | Follows the JSON standard; no alternative encodings. |
| Disallowed values | NaN, Infinity, locale‑dependent number formats. |
Rationale – Explicit null guarantees that the JSON keyset remains constant regardless of data content, enabling O(1) shape verification and stable hashing across languages with different default serialization behaviours (e.g., JavaScript vs. Rust).
Production Pipeline
Input: [.facet] → [AST] → [R‑DAG] → [Token Box]
│
Core: [[ CANONICAL JSON IR ]] ─┤
│
Views: ┌──────────────┼───────────────┐
▼ ▼ ▼
[OpenAI] [Anthropic] [Gemini]
│ │ │
Output: └──────────────┼───────────────┘
▼
[ API Call ]
- Canonical JSON is the single source of truth.
- Provider payloads (OpenAI, Anthropic, Gemini, etc.) are derived views of Canonical JSON, not sources of truth.
FACET enforces the rule:
All provider payloads are ephemeral. Canonical JSON is permanent.
Consequences
- Switching providers does not invalidate history.
- Stored executions remain replayable even if a vendor API changes.
- Audits and compliance reports are immune to provider schema drift.
- Bugs in a provider adapter cannot corrupt the core execution record.
In practice:
- Canonical JSON is stored, hashed, diffed, and cached.
- Provider payloads are generated just‑in‑time and discarded.
Thus vendor lock‑in is structurally impossible at the execution layer. If a provider:
- rejects a payload
- enforces undocumented constraints
- changes streaming semantics
the failure is isolated to the adapter layer; Canonical JSON remains valid, stable, and reusable.
@test / Snapshot Testing
Canonical JSON enables true snapshot testing for AI systems because it is:
- byte‑for‑byte deterministic
- provider‑agnostic
- fully explicit in structure
Typical test flow
- Run the full execution pipeline in Pure Mode.
- Produce Canonical JSON.
- Hash and/or store the JSON as a snapshot.
- Future runs compare against this snapshot.
@test "payment flow"
vars:
amount: 100
currency: "USD"
assert:
- canonical_json_hash == "b3e2…"
Guarantees
- Logic changes are immediately visible.
- Provider drift cannot invalidate tests.
- Regressions are caught before deployment.
For enterprise systems this enables:
- deterministic CI pipelines
- audit‑safe execution logs
- reproducible incident analysis
- long‑term caching with cryptographic guarantees
If all of the following are true:
- same FACET document
- same inputs
- same execution mode (Pure)
- same lens registry
Then:- Canonical JSON MUST be identical
Hash(canonical_json)MUST be identical- Downstream behavior MUST be reproducible
This is the foundation for:
- memoization
- snapshot testing
- deterministic agents
Comparison: Ad‑hoc JSON vs. Canonical JSON
| Property | Ad‑hoc JSON | Canonical JSON |
|---|---|---|
| Field order | Unstable | Deterministic |
| Optional fields | Implicit | Explicit (null) |
| Provider leakage | High | None |
| Diff‑friendly | No | Yes |
| Cache‑safe | No | Yes |
Canonical JSON turns AI behavior into versioned, testable artifacts—not ephemeral model outputs.
Replayability
| Replayable | No |
| Yes |
JSON is not a data format.
semantic boundary.
Canonical JSON turns that boundary into something that can be reasoned about, tested, and trusted.
FACET Canonical JSON plays the same role in AI systems that LLVM IR plays in compilers.
Compiler Stack
FACET Stack
| Layer | Description |
|---|---|
| Source Code | .facet document |
| AST | Typed FACET AST |
| LLVM IR | Canonical JSON |
| Target Backend | Provider Adapter (OpenAI / Anthropic / Gemini) |
| Machine Code | Provider Payload |
Key properties shared with LLVM IR
- Provider‑independent representation
- Deterministic and stable shape
- Diffable and inspectable
- Safe target for optimization, caching, and replay
Just as LLVM allows one program to target x86, ARM, or WebAssembly without changing source code, FACET allows one agent architecture to target multiple LLM providers without changing execution semantics.
This is why Canonical JSON is treated as an Intermediate Representation, not a serialization detail. Once this layer exists, provider payloads become replaceable implementation details.
Normative Canonical JSON Model
This document defines the normative Canonical JSON Model for FACET v2.0 and later.
All compliant implementations MUST follow these rules when producing canonical execution output.