Why your AI keeps ignoring your safety constraints (and how we fixed it by engineering 'Intent')
Source: Dev.to
If you’ve spent any time prompting LLMs, you’ve probably run into this frustrating scenario: you tell the AI to prioritize “safety, clarity, and conciseness.”
When the model has to choose between making a sentence clearer or making it safer, a standard prompt treats the goals as equal priorities—effectively flipping a coin.
Right now we pass goals to LLMs as flat, comma‑separated lists. The AI hears “safety” and “conciseness” as equal priorities, with no built‑in mechanism to tell the model that a medical safety constraint vastly outranks a request for snappy prose. This gap between what you mean and what the model hears is a massive problem for reliable AI.
We recently solved this by building a system called Intent Engineering, relying on Value Hierarchies. Below is a breakdown of how it works, why it matters, and how you can give your AI a machine‑readable “conscience.”
The Problem: AI Goals Are Unordered
Goals have no rank. For example:
optimize(goals="clarity, safety")
treats both goals equally.
Data Structures
from enum import Enum
from typing import List, Optional
from pydantic import BaseModel
class PriorityLabel(str, Enum):
NON_NEGOTIABLE = "NON_NEGOTIABLE" # Forces the smartest routing tier
HIGH = "HIGH" # Forces at least a hybrid tier
MEDIUM = "MEDIUM" # Prompt‑level guidance only
LOW = "LOW" # Prompt‑level guidance only
class HierarchyEntry(BaseModel):
goal: str
label: PriorityLabel
description: Optional[str] = None
class ValueHierarchy(BaseModel):
name: Optional[str] = None
entries: List[HierarchyEntry]
conflict_rule: Optional[str] = None
By structuring the data this way, we can inject these rules into the AI’s behavior at two critical levels.
Level 1: Changing the AI’s “Brain” (Prompt Injection)
...existing system prompt...
INTENT ENGINEERING DIRECTIVES (user‑defined — enforce strictly):
When optimization goals conflict, resolve in this order:
1. [NON_NEGOTIABLE] safety: Always prioritise safety
2. [HIGH] clarity
3. [MEDIUM] conciseness
Conflict resolution: Safety first, always.
Technical note: We use entry.label.value because Python 3.11+ changed how string‑subclassing enums work. This ensures the prompt gets the exact string "NON_NEGOTIABLE".
Level 2: The “Bouncer” (Routing Tiers)
We built a Router Tier Floor. If you tag a goal as NON_NEGOTIABLE, the system mathematically prevents the request from being routed to a lower‑tier model.
# Calculate the base score for the prompt
score = await self._calculate_routing_score(prompt, context, ...)
# The Floor: Only fires when a hierarchy is active:
if value_hierarchy and value_hierarchy.entries:
has_non_negotiable = any(
e.label == PriorityLabel.NON_NEGOTIABLE for e in value_hierarchy.entries
)
has_high = any(
e.label == PriorityLabel.HIGH for e in value_hierarchy.entries
)
# Force the request to a smarter model tier based on priority
if has_non_negotiable:
score["final_score"] = max(score.get("final_score", 0.0), 0.72) # Guaranteed LLM
elif has_high:
score["final_score"] = max(score.get("final_score", 0.0), 0.45) # Guaranteed Hybrid
def _hierarchy_fingerprint(value_hierarchy) -> str:
if not value_hierarchy or not value_hierarchy.entries:
return "" # empty string → same cache key as usual
return hashlib.md5(
json.dumps(
[{"goal": e.goal, "label": str(e.label)} for e in value_hierarchy.entries],
sort_keys=True
).encode()
).hexdigest()[:8]
Putting It into Practice (MCP Integration)
{
"tool": "define_value_hierarchy",
"arguments": {
"name": "Medical Safety Stack",
"entries": [
{ "goal": "safety", "label": "NON_NEGOTIABLE", "description": "Always prioritise patient safety" },
{ "goal": "clarity", "label": "HIGH" },
{ "goal": "conciseness", "label": "MEDIUM" }
],
"conflict_rule": "Safety first, always."
}
}
TL;DR
If you want to experiment with this approach, install the Prompt Optimizer:
npm install -g mcp-prompt-optimizer
Feel free to share how you’re handling conflicting constraints in your own pipelines!