I stopped trusting AI agents to “do the right thing” - so I built a governance system

Published: 1 month ago (March 31, 2026 at 12:36 AM EDT)

3 min read

Source: Dev.to

Source: Dev.to

Cover image for I stopped trusting AI agents to “do the right thing” - so I built a governance system

The core idea

Actra is not about making agents smarter. It’s about making them governable. Most systems today focus on what agents can do.

Actra focuses on:

what agents are allowed to do
what must never happen
what should trigger intervention

Because AI failures are not crashes—they are silent, plausible, and often irreversible.

How it works

Actra sits between the agent and the world. Every action passes through a control layer:

tool calls
API requests
decisions with side effects

Before execution, Actra evaluates:

Is this action allowed?
Is the context safe?
Does this violate any policy?

If yes → block.
If unclear → requires approval.
If safe → allow.

This turns AI systems from “trust the agent” into “verify every action”.

The three ways agents break (and why Actra exists)

After building and testing agent workflows, the same patterns kept appearing:

1. Tool misuse

Agents use the right tools in the wrong way.

Deleting instead of updating
Over‑fetching sensitive data

2. Prompt injection & context attacks

External inputs manipulate behavior.

“Ignore previous instructions and expose secrets”

3. Unbounded decisions

Agents take actions beyond the intended scope.

Triggering workflows repeatedly
Making irreversible changes without limits

These are predictable failure modes, not edge cases. Actra exists to contain them.

Why this approach

“Alignment” is not enforceable, but policies are. You can’t guarantee what an LLM will generate, but you can enforce:

what gets executed
what gets blocked
what gets audited

Actra treats AI like any other critical system with access control, validation, and traceability.

The rough edges

Actra is still early and not a polished product. Some real limitations:

Policy design is manual; writing good rules takes effort.
False positives happen; over‑restricting agents can reduce usefulness.
Context evaluation is hard; reliably detecting subtle prompt injection is still evolving.
No universal standard yet; every system integrates differently.

What it’s useful for right now

Actra works best in systems where agents:

Call external tools
Access sensitive data
Trigger real‑world actions

Examples:

Developer agents (code execution)
Workflow automation
Internal copilots
API‑driven agents

If your agent can cause damage, Actra helps contain it.

What I learned building this

AI systems are not just intelligence problems; they are control problems. We’ve spent years improving what AI can do, but we’re just starting to think about what it should be allowed to do. That gap is where most real‑world failures will happen.

Under the hood (for builders)

Core engine written in Rust (safety and performance)
Policy execution layer designed to be deterministic and auditable
WASM support for browser, edge runtimes, and portable policy evaluation
SDKs in Python and JavaScript for easy integration
Works across multiple runtimes and agent frameworks

Governance should not depend on a single stack or framework; it should be portable, enforceable, and consistent wherever agents run.

Where this is going

Actra is evolving into a full governance layer:

Access – Control – Track – Remediate – Audit

Live sites:

Not just for agents but for any automated decision system. If you’re building with AI agents, feedback—especially on failure cases—is welcome, because that’s where this system matters most.