Why your AI keeps ignoring your safety constraints (and how we fixed it by engineering 'Intent')

Published: 2 months ago (February 25, 2026 at 03:31 AM EST)

3 min read

Source: Dev.to

Source: Dev.to

If you’ve spent any time prompting LLMs, you’ve probably run into this frustrating scenario: you tell the AI to prioritize “safety, clarity, and conciseness.”
When the model has to choose between making a sentence clearer or making it safer, a standard prompt treats the goals as equal priorities—effectively flipping a coin.

Right now we pass goals to LLMs as flat, comma‑separated lists. The AI hears “safety” and “conciseness” as equal priorities, with no built‑in mechanism to tell the model that a medical safety constraint vastly outranks a request for snappy prose. This gap between what you mean and what the model hears is a massive problem for reliable AI.

We recently solved this by building a system called Intent Engineering, relying on Value Hierarchies. Below is a breakdown of how it works, why it matters, and how you can give your AI a machine‑readable “conscience.”

The Problem: AI Goals Are Unordered

Goals have no rank. For example:

optimize(goals="clarity, safety")

treats both goals equally.

Data Structures

from enum import Enum
from typing import List, Optional
from pydantic import BaseModel

class PriorityLabel(str, Enum):
    NON_NEGOTIABLE = "NON_NEGOTIABLE"  # Forces the smartest routing tier
    HIGH           = "HIGH"            # Forces at least a hybrid tier
    MEDIUM         = "MEDIUM"          # Prompt‑level guidance only
    LOW            = "LOW"             # Prompt‑level guidance only

class HierarchyEntry(BaseModel):
    goal: str
    label: PriorityLabel
    description: Optional[str] = None

class ValueHierarchy(BaseModel):
    name: Optional[str] = None
    entries: List[HierarchyEntry]
    conflict_rule: Optional[str] = None

By structuring the data this way, we can inject these rules into the AI’s behavior at two critical levels.

Level 1: Changing the AI’s “Brain” (Prompt Injection)

...existing system prompt...

INTENT ENGINEERING DIRECTIVES (user‑defined — enforce strictly):
When optimization goals conflict, resolve in this order:
  1. [NON_NEGOTIABLE] safety: Always prioritise safety
  2. [HIGH] clarity
  3. [MEDIUM] conciseness

Conflict resolution: Safety first, always.

Technical note: We use entry.label.value because Python 3.11+ changed how string‑subclassing enums work. This ensures the prompt gets the exact string "NON_NEGOTIABLE".

Level 2: The “Bouncer” (Routing Tiers)

We built a Router Tier Floor. If you tag a goal as NON_NEGOTIABLE, the system mathematically prevents the request from being routed to a lower‑tier model.

# Calculate the base score for the prompt 
score = await self._calculate_routing_score(prompt, context, ...)

# The Floor: Only fires when a hierarchy is active:
if value_hierarchy and value_hierarchy.entries:
    has_non_negotiable = any(
        e.label == PriorityLabel.NON_NEGOTIABLE for e in value_hierarchy.entries
    )
    has_high = any(
        e.label == PriorityLabel.HIGH for e in value_hierarchy.entries
    )

    # Force the request to a smarter model tier based on priority
    if has_non_negotiable:
        score["final_score"] = max(score.get("final_score", 0.0), 0.72)  # Guaranteed LLM
    elif has_high:
        score["final_score"] = max(score.get("final_score", 0.0), 0.45)  # Guaranteed Hybrid

def _hierarchy_fingerprint(value_hierarchy) -> str:
    if not value_hierarchy or not value_hierarchy.entries:
        return ""   # empty string → same cache key as usual
    return hashlib.md5(
        json.dumps(
            [{"goal": e.goal, "label": str(e.label)} for e in value_hierarchy.entries],
            sort_keys=True
        ).encode()
    ).hexdigest()[:8]

Putting It into Practice (MCP Integration)

{
  "tool": "define_value_hierarchy",
  "arguments": {
    "name": "Medical Safety Stack",
    "entries": [
      { "goal": "safety", "label": "NON_NEGOTIABLE", "description": "Always prioritise patient safety" },
      { "goal": "clarity", "label": "HIGH" },
      { "goal": "conciseness", "label": "MEDIUM" }
    ],
    "conflict_rule": "Safety first, always."
  }
}

TL;DR

If you want to experiment with this approach, install the Prompt Optimizer:

npm install -g mcp-prompt-optimizer

Feel free to share how you’re handling conflicting constraints in your own pipelines!

Why your AI keeps ignoring your safety constraints (and how we fixed it by engineering 'Intent')

The Problem: AI Goals Are Unordered

Data Structures

Level 1: Changing the AI’s “Brain” (Prompt Injection)

Level 2: The “Bouncer” (Routing Tiers)

Putting It into Practice (MCP Integration)

TL;DR

Related posts

Your AI is a Confident Liar: How to Actually Fix Factual Hallucinations

Beyond Chatbots: Can We Give AI Agents an 'Undo' Button? Exploring Gorilla GoEx 🦍

Did that actually help? Evaluating AI coding assistants with hard numbers

Beyond the Chatbot: A Blueprint for Trustable AI

The Problem: AI Goals Are Unordered

Data Structures

Level 1: Changing the AI’s “Brain” (Prompt Injection)

Level 2: The “Bouncer” (Routing Tiers)

Putting It into Practice (MCP Integration)

TL;DR

Related posts

Your AI is a Confident Liar: How to Actually Fix Factual Hallucinations

Beyond Chatbots: Can We Give AI Agents an 'Undo' Button? Exploring Gorilla GoEx 🦍

Did that actually help? Evaluating AI coding assistants with hard numbers

Beyond the Chatbot: A Blueprint for Trustable AI

Level 1: Changing the AI’s “Brain” (Prompt Injection)

Level 2: The “Bouncer” (Routing Tiers)