How to Add a Kill Switch to Your AI Agent in 5 Minutes
Source: Dev.to
Introduction
Your AI agent is running in production, calling APIs, making decisions, and spending money. If it goes sideways—gets stuck in a loop, hallucinates tool calls, or burns through your API budget—your only option is to manually kill the process. This tutorial adds a real kill switch in five minutes without changing your agent’s core code.
A reverse‑proxy sits between your agent and the LLM provider. Every request flows through it, policies are defined in YAML, and when a policy triggers the request is blocked before it reaches the model.
Prerequisites
- Docker and Docker Compose installed
- An OpenAI API key (or any OpenAI‑compatible provider)
- An AI agent that uses the OpenAI API format
Setup
git clone https://github.com/airblackbox/air-platform.git
cd air-platform
cp .env.example .env
Edit .env and add your API key:
OPENAI_API_KEY=sk-your-key-here
Start the platform:
make up
Six services start in about 8 seconds. The important one is the Gateway running on http://localhost:8080.
Point Your Agent at the Gateway
Python (OpenAI SDK)
from openai import OpenAI
# Before — calls OpenAI directly
# client = OpenAI()
# After — calls through AIR Blackbox Gateway
client = OpenAI(base_url="http://localhost:8080/v1")
LangChain
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
openai_api_base="http://localhost:8080/v1",
model="gpt-4o"
)
CrewAI
import os
os.environ["OPENAI_API_BASE"] = "http://localhost:8080/v1"
# CrewAI picks it up automatically
cURL
curl http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'
Your agent works exactly the same, but now every call flows through the Gateway.
Defining Kill‑Switch Policies
Edit config/policies.yaml. Below is a starter policy covering common failure modes:
policies:
# Kill switch: stop runaway loops
- name: loop-detector
description: "Kill agent if it makes more than 50 requests in 60 seconds"
trigger:
type: rate-limit
max_requests: 50
window_seconds: 60
action: block
alert: true
# Kill switch: budget cap
- name: budget-cap
description: "Kill agent if it exceeds $5 in token spend"
trigger:
type: cost-limit
max_cost_usd: 5.00
action: block
alert: true
# Kill switch: restrict dangerous tools
- name: tool-restriction
description: "Block agent from executing shell commands"
trigger:
type: tool-call
blocked_tools:
- "execute_command"
- "run_shell"
- "delete_file"
action: block
alert: true
# Risk tiers: require human approval for high‑risk actions
- name: high-risk-gate
description: "Flag requests that involve payments or external APIs"
trigger:
type: content-match
patterns:
- "payment"
- "transfer"
- "external_api"
action: flag
risk_tier: critical
Save the file. The Policy Engine picks up changes automatically—no restart needed.
Testing the Loop Detector
from openai import OpenAI
import time
client = OpenAI(base_url="http://localhost:8080/v1")
# Simulate a runaway agent — rapid repeated calls
for i in range(60):
try:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": f"Request {i}"}]
)
print(f"Request {i}: OK")
except Exception as e:
print(f"Request {i}: BLOCKED — {e}")
break
time.sleep(0.5)
You should see normal responses until the rate limit triggers, after which requests are blocked and the agent stops.
Observability
- Jaeger trace viewer – full trace of every request.
- Prometheus metrics – cost and request statistics.
- Episode Store API – replay the full sequence with context, latency, and cost.
What You’ve Gained in 5 Minutes
- Loop detection – runaway agents are killed automatically.
- Budget caps – no surprise API bills.
- Tool restrictions – dangerous functions are blocked.
- Risk tiers – high‑risk actions are flagged for human review.
- Full audit trail – every decision is recorded and replayable.
All without modifying a single line of your agent’s core logic; the kill switch lives in the infrastructure layer.
Custom Policies
The Policy Engine supports arbitrary YAML‑based rules. You can:
- Block specific models.
- Restrict token counts per request.
- Require human approval for particular tool calls.
- Define any pattern you need.
Framework Plugins
For deeper integration, trust plugins are available for:
- CrewAI
- LangChain
- AutoGen
- OpenAI Agents SDK
These add trust scoring and policy enforcement at the framework level.
MCP Security (Optional)
If you use the Model Context Protocol (MCP), the MCP Security Scanner audits your MCP server configurations, and the MCP Policy Gateway enforces policies on MCP tool calls.
License & Source
The full platform is open source under the Apache 2.0 license.
AIR Blackbox is a flight recorder for AI agents—record every decision, replay every incident, enforce every policy. If your agents are making decisions in production, they need a black box.