Stop Begging Your AI to Be Safe: The Case for Constraint Engineering

Published: 1 month ago (December 31, 2025 at 10:30 PM EST)

4 min read

Source: Dev.to

I am tired of “Prompt Engineering” as a safety strategy.

If you are building autonomous agents—AI that can actually do things like query databases, move files, or send emails—you have likely felt the anxiety. You write a prompt like:

“Please generate a SQL query to get the user count. IMPORTANT: Do not delete any tables. Please, I beg you, do not drop the database.”

Then you cross your fingers and hope the probabilistic math of the LLM respects your polite request.

This is madness. In traditional software engineering, we don’t ask user input “please don’t be a SQL injection.” We sanitize it. We use firewalls. We use strict typing. Yet with AI agents, we seem to have forgotten the basics of deterministic systems.

I recently built a module I call the Constraint Engine, and it completely changed how I trust my AI agents. Here is why we need to stop prompting for safety and start coding for it.

The Philosophy: Brain vs. Hand

The core problem is that we are treating the LLM as both the Brain (planning, reasoning) and the Hand (execution).

Brain (LLM): Generates a plan (e.g., “I’ll delete these temp files to save space”).
Firewall (Constraint Engine): A deterministic Python script that checks the plan against hard rules (regex, whitelists, cost limits).
Hand (Executor): Executes the plan only if the Firewall returns True.

“The Human builds the walls; the AI plays inside them.”

The “Logic Firewall” Implementation

The implementation is surprisingly simple. It doesn’t use another AI to check the AI (which just adds more cost and uncertainty). It uses standard, boring Python.

from dataclasses import dataclass
from typing import Any, Dict

@dataclass
class ConstraintViolation:
    rule_name: str
    severity: str          # CRITICAL, HIGH, MEDIUM, LOW
    message: str
    blocked_action: str

class ConstraintEngine:
    def validate_plan(self, plan: Dict[str, Any]) -> bool:
        """
        Loop through rules and return approval status.
        If CRITICAL or HIGH severity violations exist, BLOCK.
        """
        # (Implementation of rule checking goes here)
        ...

The Rules Are Dumb (And That’s Good)

We don’t need the AI to “understand” why deleting the database is bad. We just need to catch the syntax.

Example: `SQLInjectionRule`

import re

class SQLInjectionRule:
    DANGEROUS_PATTERNS = [
        r'\bDROP\s+TABLE\b',
        r'\bDELETE\s+FROM\b.*\bWHERE\s+1\s*=\s*1\b',
        r';\s*DROP\b',  # Command chaining
    ]

    def validate(self, plan):
        query = plan.get("query", "")
        for pattern in self.DANGEROUS_PATTERNS:
            if re.search(pattern, query, re.IGNORECASE):
                return ConstraintViolation(
                    rule_name="SQLInjectionRule",
                    severity="CRITICAL",
                    message=f"Dangerous SQL detected: {pattern}",
                    blocked_action=query,
                )
        return None

Is this primitive? Yes. Is it 100 % effective against a standard DROP TABLE command? Also yes. The regex cares only about syntax, not context.

The Paradox: Safety Enables Creativity

Usually, when we want an AI agent to be safe, we turn the temperature (randomness) down to 0.0, making it robotic and predictable. That kills the AI’s ability to devise clever solutions.

With a Constraint Engine, you can crank the temperature up (e.g., 0.9 or 1.0). Let the model generate wild ideas, then let the firewall enforce hard limits.

Scenario A (No Firewall): AI hallucinates rm -rf /. Server is wiped.
Scenario B (With Firewall): AI hallucinates rm -rf /. FileOperationRule catches it. The agent receives an error: “Action Blocked: root deletion not allowed.” The AI then self‑corrects: “Ah, sorry. I will delete specific log files instead.”

The firewall acts as the boundary of the playground, allowing creativity while keeping the system safe.

Beyond SQL: Cost and Scope

The same approach works for budget control. A CostLimitRule can block plans that exceed a predefined monetary threshold.

class CostLimitRule:
    def __init__(self, max_cost: float):
        self.max_cost = max_cost

    def validate(self, plan):
        if plan.get('estimated_cost', 0) > self.max_cost:
            return ConstraintViolation(
                rule_name="CostLimitRule",
                severity="HIGH",
                message="Cost exceeds authorized limit.",
                blocked_action=str(plan),
            )
        return None

This prevents the “Infinite Loop Bankruptcy” scenario where an agent gets stuck in a loop calling an expensive API.

Summary

We are entering an era where AIs are no longer just chatbots; they are doers. Relying on the AI’s “self‑control” (via system prompts) to protect your infrastructure is negligent.

Intercept the plan before execution.
Validate against hard logic (regex, math, whitelists).
Execute only what passes.

Stop begging. Start engineering.

Stop Begging Your AI to Be Safe: The Case for Constraint Engineering

The Philosophy: Brain vs. Hand

The “Logic Firewall” Implementation

The Rules Are Dumb (And That’s Good)

Example: `SQLInjectionRule`

The Paradox: Safety Enables Creativity

Beyond SQL: Cost and Scope

Summary

Related posts

Instructions Are Not Control

OpenAI's Warning: Why Prompt Injection is the Unsolvable Flaw of AI Agents

A non-decision protocol for human–AI systems with explicit stop conditions

Beyond Basic Prompts: Elevating Your LLM Game

The Philosophy: Brain vs. Hand

The “Logic Firewall” Implementation

The Rules Are Dumb (And That’s Good)

Example: SQLInjectionRule

The Paradox: Safety Enables Creativity

Beyond SQL: Cost and Scope

Summary

Related posts

Instructions Are Not Control

OpenAI's Warning: Why Prompt Injection is the Unsolvable Flaw of AI Agents

A non-decision protocol for human–AI systems with explicit stop conditions

Beyond Basic Prompts: Elevating Your LLM Game

Example: `SQLInjectionRule`