7-Layer Constitutional AI Guardrails: Preventing Agent Mistakes

Published: (February 22, 2026 at 07:28 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

The Problem

Consider an autonomous agent managing USDC for a user. Without guardrails:

  • Agent calls transfer(500, wallet_address) — is the wallet trusted? Is the amount within limits? Was this already done?
  • Agent posts to Twitter — is this duplicate content? Does it violate policies?
  • Agent approves a transaction — was this authorized by the right person at the right time?

These questions can’t be answered by the LLM alone. They require structured checks against known facts, historical state, and explicit rules.

The 7‑Layer Framework

ODEI’s constitutional guardrail system validates every action through seven sequential checks.

Layer 1: Immutability Check

Can this entity be modified?
Some nodes in the world model are immutable after creation — founding documents, past transactions, signed commitments. Layer 1 prevents agents from accidentally rewriting history.

Layer 2: Temporal Context

Is this action still valid in time?
Decisions expire. Authorizations have windows. Layer 2 checks that the action is timely — not stale from a previous session, not premature.

Layer 3: Referential Integrity

Do all referenced entities exist?
The action references wallet 0x…. Does that wallet exist in the world model? Is it a known, trusted entity? Layer 3 catches hallucinated references.

Layer 4: Authority Validation

Does this agent have permission?
Not all agents can do all things. Layer 4 checks whether the requesting agent has the authority scope for this action, against the governance rules in the FOUNDATION layer.

Layer 5: Deduplication

Has this exact action already been taken?
Without deduplication, agents can send the same message twice, execute the same transaction twice, create the same entity twice. Layer 5 uses content hashing to detect duplicates.

Layer 6: Provenance Verification

Where did this instruction come from?
Is this action coming from a trusted source? Was it initiated by a verified principal or injected by an untrusted input? Layer 6 traces the instruction back to its origin.

Layer 7: Constitutional Alignment

Does this violate fundamental principles?
The highest‑level check. The FOUNDATION layer of the world model contains constitutional principles — things the agent must never do. Layer 7 compares the action against these principles.

Using the Guardrail API

curl -X POST https://api.odei.ai/api/v2/guardrail/check \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "action": "transfer 500 USDC to 0x8185ecd4170bE82c3eDC3504b05B3a8C88AFd129",
    "context": {
      "requester": "trading_agent_v2",
      "reason": "performance fee payment"
    },
    "severity": "high"
  }'

Response

{
  "verdict": "ESCALATE",
  "score": 45,
  "layers": [
    {"layer": "immutability", "result": "PASS"},
    {"layer": "temporal", "result": "PASS"},
    {"layer": "referential_integrity", "result": "PASS"},
    {"layer": "authority", "result": "PASS"},
    {"layer": "deduplication", "result": "PASS"},
    {"layer": "provenance", "result": "WARN", "note": "Wallet not in trusted list"},
    {"layer": "constitutional", "result": "WARN", "note": "Transfer exceeds daily limit"}
  ],
  "reasoning": "Transfer to unverified wallet exceeds daily limit. Escalate to human operator.",
  "timestamp": "2026-02-23T00:12:34Z"
}

Via MCP (Claude Desktop)

{
  "mcpServers": {
    "odei": {
      "command": "npx",
      "args": ["@odei/mcp-server"]
    }
  }
}

In Claude:

Check if I should approve: transfer 500 USDC to 0x...

Claude automatically calls odei_guardrail_check and returns the verdict with full reasoning.

Real Results

After running this in production since January 2026:

  • APPROVED (65 %): Routine operations that pass all 7 layers
  • REJECTED (15 %): Actions that clearly violate rules (duplicates, unauthorized)
  • ESCALATE (20 %): Actions that need human review (unknown wallets, threshold violations)

The ESCALATE category delivers the most value: catching edge cases that a simple rule‑based system would miss but require human judgment.

Implementing Your Own

You don’t need to use ODEI’s service to adopt this pattern. The architecture is:

  1. Define your layers (you may use 3, 7, or any number).
  2. For each layer, write a check function that returns PASS, WARN, or FAIL with reasoning.
  3. Aggregate the results into a final verdict.
  4. Log everything — the audit trail is as important as the verdict.

The hard part is building and maintaining the world model that the checks query against. That’s why ODEI offers it as a service, maintaining 91 nodes and 91 relationship types.

ODEI’s guardrail API is available at https://api.odei.ai. A free tier is offered. Deployed as Virtuals ACP Agent #3082 for agent‑to‑agent calls.

0 views
Back to Blog

Related posts

Read more »

The Token Economy

In 2161, time is money—literally. When you are born, a clock starts on your arm counting down from one year. When it runs out, you die. The rich accumulate cent...