Stop Begging Your AI to Be Safe: The Case for Constraint Engineering
Source: Dev.to
I am tired of “Prompt Engineering” as a safety strategy.
If you are building autonomous agents—AI that can actually do things like query databases, move files, or send emails—you have likely felt the anxiety. You write a prompt like:
“Please generate a SQL query to get the user count. IMPORTANT: Do not delete any tables. Please, I beg you, do not drop the database.”
Then you cross your fingers and hope the probabilistic math of the LLM respects your polite request.
This is madness. In traditional software engineering, we don’t ask user input “please don’t be a SQL injection.” We sanitize it. We use firewalls. We use strict typing. Yet with AI agents, we seem to have forgotten the basics of deterministic systems.
I recently built a module I call the Constraint Engine, and it completely changed how I trust my AI agents. Here is why we need to stop prompting for safety and start coding for it.
The Philosophy: Brain vs. Hand
The core problem is that we are treating the LLM as both the Brain (planning, reasoning) and the Hand (execution).
- Brain (LLM): Generates a plan (e.g., “I’ll delete these temp files to save space”).
- Firewall (Constraint Engine): A deterministic Python script that checks the plan against hard rules (regex, whitelists, cost limits).
- Hand (Executor): Executes the plan only if the Firewall returns
True.
“The Human builds the walls; the AI plays inside them.”
The “Logic Firewall” Implementation
The implementation is surprisingly simple. It doesn’t use another AI to check the AI (which just adds more cost and uncertainty). It uses standard, boring Python.
from dataclasses import dataclass
from typing import Any, Dict
@dataclass
class ConstraintViolation:
rule_name: str
severity: str # CRITICAL, HIGH, MEDIUM, LOW
message: str
blocked_action: str
class ConstraintEngine:
def validate_plan(self, plan: Dict[str, Any]) -> bool:
"""
Loop through rules and return approval status.
If CRITICAL or HIGH severity violations exist, BLOCK.
"""
# (Implementation of rule checking goes here)
...
The Rules Are Dumb (And That’s Good)
We don’t need the AI to “understand” why deleting the database is bad. We just need to catch the syntax.
Example: SQLInjectionRule
import re
class SQLInjectionRule:
DANGEROUS_PATTERNS = [
r'\bDROP\s+TABLE\b',
r'\bDELETE\s+FROM\b.*\bWHERE\s+1\s*=\s*1\b',
r';\s*DROP\b', # Command chaining
]
def validate(self, plan):
query = plan.get("query", "")
for pattern in self.DANGEROUS_PATTERNS:
if re.search(pattern, query, re.IGNORECASE):
return ConstraintViolation(
rule_name="SQLInjectionRule",
severity="CRITICAL",
message=f"Dangerous SQL detected: {pattern}",
blocked_action=query,
)
return None
Is this primitive? Yes. Is it 100 % effective against a standard DROP TABLE command? Also yes. The regex cares only about syntax, not context.
The Paradox: Safety Enables Creativity
Usually, when we want an AI agent to be safe, we turn the temperature (randomness) down to 0.0, making it robotic and predictable. That kills the AI’s ability to devise clever solutions.
With a Constraint Engine, you can crank the temperature up (e.g., 0.9 or 1.0). Let the model generate wild ideas, then let the firewall enforce hard limits.
- Scenario A (No Firewall): AI hallucinates
rm -rf /. Server is wiped. - Scenario B (With Firewall): AI hallucinates
rm -rf /.FileOperationRulecatches it. The agent receives an error: “Action Blocked: root deletion not allowed.” The AI then self‑corrects: “Ah, sorry. I will delete specific log files instead.”
The firewall acts as the boundary of the playground, allowing creativity while keeping the system safe.
Beyond SQL: Cost and Scope
The same approach works for budget control. A CostLimitRule can block plans that exceed a predefined monetary threshold.
class CostLimitRule:
def __init__(self, max_cost: float):
self.max_cost = max_cost
def validate(self, plan):
if plan.get('estimated_cost', 0) > self.max_cost:
return ConstraintViolation(
rule_name="CostLimitRule",
severity="HIGH",
message="Cost exceeds authorized limit.",
blocked_action=str(plan),
)
return None
This prevents the “Infinite Loop Bankruptcy” scenario where an agent gets stuck in a loop calling an expensive API.
Summary
We are entering an era where AIs are no longer just chatbots; they are doers. Relying on the AI’s “self‑control” (via system prompts) to protect your infrastructure is negligent.
- Intercept the plan before execution.
- Validate against hard logic (regex, math, whitelists).
- Execute only what passes.
Stop begging. Start engineering.