Why I Spent 6 Months Building Guardrails for AI Agents
Source: Dev.to
Here’s why I built it, how it works, and what I learned
The Problem: AI Agents Are Powerful but Terrifying
I’ve been obsessed with AI agents – not chatbots, but agents that actually do things. Agents that can:
- Merge pull requests
- Deploy to Kubernetes
- Update database records
- Send Slack messages on your behalf
The technology is ready. But every time I tried to deploy one to production, the same thing happened:
Security said no.
And honestly? They were right.
Think about it: you’re giving an AI the ability to write to production systems, and there’s no audit trail, no approval workflow, no way to enforce policies. It’s like giving an intern root access and hoping for the best.
I kept seeing teams stuck in what I call “PoC Purgatory” – amazing demos that never ship because there’s no governance story.
The Solution: Policy‑Before‑Dispatch
What if every AI action had to pass through a policy check before it executed?
That’s the core idea behind Cordum.
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ AI Agent │ --> │ Safety Kernel│ --> │ Action │
└─────────────┘ └──────────────┘ └─────────────┘
│
┌──────┴──────┐
│ Policy │
│ (as code) │
└─────────────┘
Before any job executes, the Safety Kernel evaluates your policy and returns one of:
- ✅ Allow – proceed normally
- ❌ Deny – block with reason
- 👤 Require Approval – human in the loop
- ⏳ Throttle – rate limit
Show Me the Code
Here’s what a policy looks like:
# policy.yaml
rules:
- id: require-approval-for-prod
match:
risk_tags: [prod, write]
decision: require_approval
reason: "Production writes need human approval"
- id: block-destructive
match:
capabilities: [delete, drop, destroy]
decision: deny
reason: "Destructive operations not allowed"
- id: allow-read-only
match:
risk_tags: [read]
decision: allow
When an agent tries to do something dangerous, Cordum intervenes:
{
"job_id": "job_abc123",
"decision": "require_approval",
"reason": "Production writes need human approval",
"matched_rule": "require-approval-for-prod"
}
The job waits until a human approves it in the dashboard. Full audit trail. Compliance happy.
Architecture
Cordum is a control plane, not an agent framework. It orchestrates and governs agents – it doesn’t replace LangChain or CrewAI.
┌─────────────────────────────────────────────────────────┐
│ Cordum Control Plane │
├─────────────────────────────────────────────────────────┤
│ ┌───────────┐ ┌──────────────┐ ┌─────────────────┐ │
│ │ Scheduler │ │ Safety Kernel│ │ Workflow Engine │ │
│ └───────────┘ └──────────────┘ └─────────────────┘ │
├─────────────────────────────────────────────────────────┤
│ ┌───────────────┐ ┌───────────────────────────────┐ │
│ │ NATS Bus │ │ Redis (State) │ │
│ └───────────────┘ └───────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
│ │ │
┌────┴────┐ ┌────┴────┐ ┌───┴────┐
│ Worker │ │ Worker │ │ Worker │
│ (Slack) │ │ (GitHub) │ │ (K8s) │
└──────────┘ └──────────┘ └────────┘
Tech stack
- Go – Core control plane (~15 K lines)
- NATS JetStream – Message bus with at‑least‑once delivery
- Redis – State store for jobs, workflows, context
- React – Dashboard with real‑time updates
Performance
- (Performance details omitted in the excerpt)
What I Learned Building This
- Safety as a feature, not a constraint
I initially thought of governance as a “necessary evil” – something enterprises need for compliance. But I’ve come to see it as a feature.
(the rest of the lessons continue in the original content)
1. The “permission to write” is a competitive advantage
When every AI action is evaluated against policy and logged, you unlock use cases that were previously impossible.
- Banks can use AI agents.
- Healthcare can use AI agents.
The ability to “write” becomes a powerful differentiator.
2. The protocol matters more than I expected
I spent a lot of time on CAP, and it paid off. Having a clean protocol means:
- Workers can be written in any language.
- The control plane can evolve independently.
- Third parties can build compatible tools.
3. Open source is a distribution strategy
I could have built this as a closed SaaS from day one, but open source gives us:
- Trust – you can read the code.
- Self‑hosting – enterprises love this.
- Community funnel – users contribute and spread the word.
Business model: open‑core. Self‑hosted is free forever; cloud/enterprise features are paid.
What’s Next
The roadmap includes:
- Helm chart for Kubernetes deployment.
- Cordum Cloud – managed version.
- Visual workflow editor in the dashboard.
- More packs – AWS, GCP, PagerDuty, etc.
Try It Out
🌐 Website:
📦 GitHub:
📋 Protocol:
📚 Docs:
If you’re building AI agents and want governance built in, give it a try. Star the repo if you find it useful ⭐
Thanks for reading! I’m happy to answer questions in the comments.