Why your AI keeps ignoring your safety constraints (and how we fixed it by engineering 'Intent')

Published: (February 25, 2026 at 03:31 AM EST)
3 min read
Source: Dev.to

Source: Dev.to

If you’ve spent any time prompting LLMs, you’ve probably run into this frustrating scenario: you tell the AI to prioritize “safety, clarity, and conciseness.”
When the model has to choose between making a sentence clearer or making it safer, a standard prompt treats the goals as equal priorities—effectively flipping a coin.

Right now we pass goals to LLMs as flat, comma‑separated lists. The AI hears “safety” and “conciseness” as equal priorities, with no built‑in mechanism to tell the model that a medical safety constraint vastly outranks a request for snappy prose. This gap between what you mean and what the model hears is a massive problem for reliable AI.

We recently solved this by building a system called Intent Engineering, relying on Value Hierarchies. Below is a breakdown of how it works, why it matters, and how you can give your AI a machine‑readable “conscience.”

The Problem: AI Goals Are Unordered

Goals have no rank. For example:

optimize(goals="clarity, safety")

treats both goals equally.

Data Structures

from enum import Enum
from typing import List, Optional
from pydantic import BaseModel

class PriorityLabel(str, Enum):
    NON_NEGOTIABLE = "NON_NEGOTIABLE"  # Forces the smartest routing tier
    HIGH           = "HIGH"            # Forces at least a hybrid tier
    MEDIUM         = "MEDIUM"          # Prompt‑level guidance only
    LOW            = "LOW"             # Prompt‑level guidance only

class HierarchyEntry(BaseModel):
    goal: str
    label: PriorityLabel
    description: Optional[str] = None

class ValueHierarchy(BaseModel):
    name: Optional[str] = None
    entries: List[HierarchyEntry]
    conflict_rule: Optional[str] = None

By structuring the data this way, we can inject these rules into the AI’s behavior at two critical levels.

Level 1: Changing the AI’s “Brain” (Prompt Injection)

...existing system prompt...

INTENT ENGINEERING DIRECTIVES (user‑defined — enforce strictly):
When optimization goals conflict, resolve in this order:
  1. [NON_NEGOTIABLE] safety: Always prioritise safety
  2. [HIGH] clarity
  3. [MEDIUM] conciseness

Conflict resolution: Safety first, always.

Technical note: We use entry.label.value because Python 3.11+ changed how string‑subclassing enums work. This ensures the prompt gets the exact string "NON_NEGOTIABLE".

Level 2: The “Bouncer” (Routing Tiers)

We built a Router Tier Floor. If you tag a goal as NON_NEGOTIABLE, the system mathematically prevents the request from being routed to a lower‑tier model.

# Calculate the base score for the prompt 
score = await self._calculate_routing_score(prompt, context, ...)

# The Floor: Only fires when a hierarchy is active:
if value_hierarchy and value_hierarchy.entries:
    has_non_negotiable = any(
        e.label == PriorityLabel.NON_NEGOTIABLE for e in value_hierarchy.entries
    )
    has_high = any(
        e.label == PriorityLabel.HIGH for e in value_hierarchy.entries
    )

    # Force the request to a smarter model tier based on priority
    if has_non_negotiable:
        score["final_score"] = max(score.get("final_score", 0.0), 0.72)  # Guaranteed LLM
    elif has_high:
        score["final_score"] = max(score.get("final_score", 0.0), 0.45)  # Guaranteed Hybrid
def _hierarchy_fingerprint(value_hierarchy) -> str:
    if not value_hierarchy or not value_hierarchy.entries:
        return ""   # empty string → same cache key as usual
    return hashlib.md5(
        json.dumps(
            [{"goal": e.goal, "label": str(e.label)} for e in value_hierarchy.entries],
            sort_keys=True
        ).encode()
    ).hexdigest()[:8]

Putting It into Practice (MCP Integration)

{
  "tool": "define_value_hierarchy",
  "arguments": {
    "name": "Medical Safety Stack",
    "entries": [
      { "goal": "safety", "label": "NON_NEGOTIABLE", "description": "Always prioritise patient safety" },
      { "goal": "clarity", "label": "HIGH" },
      { "goal": "conciseness", "label": "MEDIUM" }
    ],
    "conflict_rule": "Safety first, always."
  }
}

TL;DR

If you want to experiment with this approach, install the Prompt Optimizer:

npm install -g mcp-prompt-optimizer

Feel free to share how you’re handling conflicting constraints in your own pipelines!

0 views
Back to Blog

Related posts

Read more »