Debugging Agents is Tough: How I Built a 'Flight Recorder' for AI Kernel
Source: Dev.to

Stop guessing why your agent hallucinated. Query the database.
In my last post I introduced the Agent Control Plane—a “Kernel” I built to stop agents from hallucinating rm -rf / commands.
When a standard software app crashes you get a stack trace; when an AI agent fails you usually just get a shrug.
- Why did it call the
refundtool with $0? - Did the ABAC policy block it, or did the LLM just forget to call it?
- Was the context window full?
“Sorry, it was the LLM” is not an engineering root cause—it’s an excuse.
If we want to treat agents as “Digital Employees,” we need to treat their execution cycles as Audit Logs. I added a Flight Recorder to the Kernel. Here’s why we need one, and why print() statements aren’t enough.
The “Black Box” Problem
Standard LLM observability tools (LangSmith, Arize, etc.) are great for prompt engineering: they tell you about tokens, latency, and costs.
They don’t tell you about Governance.
What I needed to know:
| Piece | Question |
|---|---|
| Intent | What tool did the Agent try to use? |
| Verdict | Did my Kernel allow it? |
| Reasoning | If it was blocked, which policy rule triggered? |
Without this triad, debugging is just guessing.
The Implementation: SQLite Is All You Need
I didn’t want to spin up a complex observability stack. I believe in Scale by Subtraction.
The Flight Recorder is a lightweight, local SQLite engine hooked directly into the Kernel’s interceptor chain. It captures the decision logic atomically.
# src/agent_control_plane/flight_recorder.py
import sqlite3, json
from datetime import datetime
class FlightRecorder:
def __init__(self, db_path="agent_blackbox.db"):
self.conn = sqlite3.connect(db_path)
self._init_schema()
def _init_schema(self):
self.conn.execute("""
CREATE TABLE IF NOT EXISTS flight_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp TEXT,
agent_id TEXT,
tool_name TEXT,
arguments TEXT,
policy_verdict TEXT,
violation_reason TEXT
)
""")
self.conn.commit()
def log_interception(self, agent_id, tool_name, args,
policy_verdict, violation_reason=None):
"""
Records the 'Black Box' data of an interception event.
"""
self.conn.execute("""
INSERT INTO flight_events
(timestamp, agent_id, tool_name, arguments, policy_verdict, violation_reason)
VALUES (?, ?, ?, ?, ?, ?)
""", (
datetime.now().isoformat(),
agent_id,
tool_name,
json.dumps(args),
policy_verdict, # 'ALLOWED' or 'BLOCKED'
violation_reason
))
self.conn.commit()
It’s boring technology—that’s the point. It’s robust, queryable, and sits right next to your code.
Querying the Crash Site
The real power isn’t recording the data; it’s interrogating it. Because it’s just SQL, I can answer complex governance questions in milliseconds.
Example 1 – Find blocked high‑value refunds
SELECT timestamp, arguments, violation_reason
FROM flight_events
WHERE agent_id = 'finance_agent'
AND policy_verdict = 'BLOCKED'
AND tool_name = 'process_refund'
ORDER BY timestamp DESC;
Example 2 – Detect a spike in policy violations after a prompt change
# kernel_v1_demo.py usage
stats = recorder.get_statistics()
print(f"Total Actions: {stats['total_actions']}")
print(f"Blocked Ratio: {stats['by_verdict'].get('blocked', 0) / stats['total_actions']:.2%}")
This transforms “AI Debugging” from a vibe check into a data‑science problem.
Governance Without the Bloat
We often over‑complicate AI infrastructure, assuming we need vector DBs for memory and massive cloud logging for audit trails. The Flight Recorder shows that a local file and a rigid schema are frequently superior.
- Zero Latency: Runs in‑process with the Kernel.
- Zero Cost: Powered by SQLite.
- Total Clarity: Replay the exact sequence of events that led to a failure.
Try It Yourself
The Kernel v1.0 ships with the Flight Recorder enabled. Clone the repo, run the demo_flight_recorder() function, and watch it generate a database file. Then try to break the agent—force it to access a protected path (/etc/passwd) and watch the recorder catch it red‑handed.
🔗 GitHub Repo: imran‑siddique/agent‑control‑plane
Intelligence without governance is just a bug waiting to happen. The Flight Recorder is how you catch it.