7-Layer Constitutional AI Guardrails: Preventing Agent Mistakes
Source: Dev.to
The Problem
Consider an autonomous agent managing USDC for a user. Without guardrails:
- Agent calls
transfer(500, wallet_address)— is the wallet trusted? Is the amount within limits? Was this already done? - Agent posts to Twitter — is this duplicate content? Does it violate policies?
- Agent approves a transaction — was this authorized by the right person at the right time?
These questions can’t be answered by the LLM alone. They require structured checks against known facts, historical state, and explicit rules.
The 7‑Layer Framework
ODEI’s constitutional guardrail system validates every action through seven sequential checks.
Layer 1: Immutability Check
Can this entity be modified?
Some nodes in the world model are immutable after creation — founding documents, past transactions, signed commitments. Layer 1 prevents agents from accidentally rewriting history.
Layer 2: Temporal Context
Is this action still valid in time?
Decisions expire. Authorizations have windows. Layer 2 checks that the action is timely — not stale from a previous session, not premature.
Layer 3: Referential Integrity
Do all referenced entities exist?
The action references wallet 0x…. Does that wallet exist in the world model? Is it a known, trusted entity? Layer 3 catches hallucinated references.
Layer 4: Authority Validation
Does this agent have permission?
Not all agents can do all things. Layer 4 checks whether the requesting agent has the authority scope for this action, against the governance rules in the FOUNDATION layer.
Layer 5: Deduplication
Has this exact action already been taken?
Without deduplication, agents can send the same message twice, execute the same transaction twice, create the same entity twice. Layer 5 uses content hashing to detect duplicates.
Layer 6: Provenance Verification
Where did this instruction come from?
Is this action coming from a trusted source? Was it initiated by a verified principal or injected by an untrusted input? Layer 6 traces the instruction back to its origin.
Layer 7: Constitutional Alignment
Does this violate fundamental principles?
The highest‑level check. The FOUNDATION layer of the world model contains constitutional principles — things the agent must never do. Layer 7 compares the action against these principles.
Using the Guardrail API
curl -X POST https://api.odei.ai/api/v2/guardrail/check \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"action": "transfer 500 USDC to 0x8185ecd4170bE82c3eDC3504b05B3a8C88AFd129",
"context": {
"requester": "trading_agent_v2",
"reason": "performance fee payment"
},
"severity": "high"
}'
Response
{
"verdict": "ESCALATE",
"score": 45,
"layers": [
{"layer": "immutability", "result": "PASS"},
{"layer": "temporal", "result": "PASS"},
{"layer": "referential_integrity", "result": "PASS"},
{"layer": "authority", "result": "PASS"},
{"layer": "deduplication", "result": "PASS"},
{"layer": "provenance", "result": "WARN", "note": "Wallet not in trusted list"},
{"layer": "constitutional", "result": "WARN", "note": "Transfer exceeds daily limit"}
],
"reasoning": "Transfer to unverified wallet exceeds daily limit. Escalate to human operator.",
"timestamp": "2026-02-23T00:12:34Z"
}
Via MCP (Claude Desktop)
{
"mcpServers": {
"odei": {
"command": "npx",
"args": ["@odei/mcp-server"]
}
}
}
In Claude:
Check if I should approve: transfer 500 USDC to 0x...
Claude automatically calls odei_guardrail_check and returns the verdict with full reasoning.
Real Results
After running this in production since January 2026:
- APPROVED (65 %): Routine operations that pass all 7 layers
- REJECTED (15 %): Actions that clearly violate rules (duplicates, unauthorized)
- ESCALATE (20 %): Actions that need human review (unknown wallets, threshold violations)
The ESCALATE category delivers the most value: catching edge cases that a simple rule‑based system would miss but require human judgment.
Implementing Your Own
You don’t need to use ODEI’s service to adopt this pattern. The architecture is:
- Define your layers (you may use 3, 7, or any number).
- For each layer, write a check function that returns
PASS,WARN, orFAILwith reasoning. - Aggregate the results into a final verdict.
- Log everything — the audit trail is as important as the verdict.
The hard part is building and maintaining the world model that the checks query against. That’s why ODEI offers it as a service, maintaining 91 nodes and 91 relationship types.
ODEI’s guardrail API is available at https://api.odei.ai. A free tier is offered. Deployed as Virtuals ACP Agent #3082 for agent‑to‑agent calls.