How to Add a Kill Switch to Your AI Agent in 5 Minutes

Published: (February 21, 2026 at 05:42 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Introduction

Your AI agent is running in production, calling APIs, making decisions, and spending money. If it goes sideways—gets stuck in a loop, hallucinates tool calls, or burns through your API budget—your only option is to manually kill the process. This tutorial adds a real kill switch in five minutes without changing your agent’s core code.

A reverse‑proxy sits between your agent and the LLM provider. Every request flows through it, policies are defined in YAML, and when a policy triggers the request is blocked before it reaches the model.

Prerequisites

  • Docker and Docker Compose installed
  • An OpenAI API key (or any OpenAI‑compatible provider)
  • An AI agent that uses the OpenAI API format

Setup

git clone https://github.com/airblackbox/air-platform.git
cd air-platform
cp .env.example .env

Edit .env and add your API key:

OPENAI_API_KEY=sk-your-key-here

Start the platform:

make up

Six services start in about 8 seconds. The important one is the Gateway running on http://localhost:8080.

Point Your Agent at the Gateway

Python (OpenAI SDK)

from openai import OpenAI

# Before — calls OpenAI directly
# client = OpenAI()

# After — calls through AIR Blackbox Gateway
client = OpenAI(base_url="http://localhost:8080/v1")

LangChain

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    openai_api_base="http://localhost:8080/v1",
    model="gpt-4o"
)

CrewAI

import os
os.environ["OPENAI_API_BASE"] = "http://localhost:8080/v1"
# CrewAI picks it up automatically

cURL

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Your agent works exactly the same, but now every call flows through the Gateway.

Defining Kill‑Switch Policies

Edit config/policies.yaml. Below is a starter policy covering common failure modes:

policies:
  # Kill switch: stop runaway loops
  - name: loop-detector
    description: "Kill agent if it makes more than 50 requests in 60 seconds"
    trigger:
      type: rate-limit
      max_requests: 50
      window_seconds: 60
    action: block
    alert: true

  # Kill switch: budget cap
  - name: budget-cap
    description: "Kill agent if it exceeds $5 in token spend"
    trigger:
      type: cost-limit
      max_cost_usd: 5.00
    action: block
    alert: true

  # Kill switch: restrict dangerous tools
  - name: tool-restriction
    description: "Block agent from executing shell commands"
    trigger:
      type: tool-call
      blocked_tools:
        - "execute_command"
        - "run_shell"
        - "delete_file"
    action: block
    alert: true

  # Risk tiers: require human approval for high‑risk actions
  - name: high-risk-gate
    description: "Flag requests that involve payments or external APIs"
    trigger:
      type: content-match
      patterns:
        - "payment"
        - "transfer"
        - "external_api"
    action: flag
    risk_tier: critical

Save the file. The Policy Engine picks up changes automatically—no restart needed.

Testing the Loop Detector

from openai import OpenAI
import time

client = OpenAI(base_url="http://localhost:8080/v1")

# Simulate a runaway agent — rapid repeated calls
for i in range(60):
    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": f"Request {i}"}]
        )
        print(f"Request {i}: OK")
    except Exception as e:
        print(f"Request {i}: BLOCKED — {e}")
        break
    time.sleep(0.5)

You should see normal responses until the rate limit triggers, after which requests are blocked and the agent stops.

Observability

  • Jaeger trace viewer – full trace of every request.
  • Prometheus metrics – cost and request statistics.
  • Episode Store API – replay the full sequence with context, latency, and cost.

What You’ve Gained in 5 Minutes

  • Loop detection – runaway agents are killed automatically.
  • Budget caps – no surprise API bills.
  • Tool restrictions – dangerous functions are blocked.
  • Risk tiers – high‑risk actions are flagged for human review.
  • Full audit trail – every decision is recorded and replayable.

All without modifying a single line of your agent’s core logic; the kill switch lives in the infrastructure layer.

Custom Policies

The Policy Engine supports arbitrary YAML‑based rules. You can:

  • Block specific models.
  • Restrict token counts per request.
  • Require human approval for particular tool calls.
  • Define any pattern you need.

Framework Plugins

For deeper integration, trust plugins are available for:

  • CrewAI
  • LangChain
  • AutoGen
  • OpenAI Agents SDK

These add trust scoring and policy enforcement at the framework level.

MCP Security (Optional)

If you use the Model Context Protocol (MCP), the MCP Security Scanner audits your MCP server configurations, and the MCP Policy Gateway enforces policies on MCP tool calls.

License & Source

The full platform is open source under the Apache 2.0 license.

AIR Blackbox is a flight recorder for AI agents—record every decision, replay every incident, enforce every policy. If your agents are making decisions in production, they need a black box.

0 views
Back to Blog

Related posts

Read more »