If your agent can delete user data, your prompt isn’t a prompt, it’s a contract

Published: (December 26, 2025 at 05:46 PM EST)
3 min read
Source: Dev.to

Source: Dev.to

Where good prompt design still breaks

Most AI engineers I know already do the “obvious” things:

  • Separate system vs. user instructions
  • Constrain output (JSON / schema)
  • Add validators
  • Use tools instead of free‑text guessing
  • Include safety language

And yet the agent still fails in ways that feel unfair:

  • Scope creeps (“everything you have on me” expands over time)
  • Tool outputs leak fields you didn’t intend to expose
  • “Helpful” behavior overrides policy on edge cases
  • Deletion gets planned too early or too confidently
  • Auditability is missing when things go wrong

It’s not because the prompt is bad. It’s because the instructions aren’t written like an executable contract. DSAR is where that gap shows up immediately.

The design move that changes everything

Write an Instruction Contract (like an API spec for behavior). Below is the practical design process I use, with examples.

1. Non‑negotiables (hard constraints, not vibes)

This isn’t “be safe.”

Example constraints

  • If the user requests data about someone else → refuse.
  • Never export raw logs; only return redacted summaries.
  • Never execute deletion unless: identity verified and explicit confirmation token provided.

These remove the interpretation layer where drift happens.

2. Scope definition (prevent scope creep, not just scope clarity)

Most teams write scope once, but scope still creeps as the agent becomes “more capable.”
Define scope as:

  • What counts
  • What doesn’t
  • How to behave when ambiguous

Example scope

Include

  • Profile fields (name, email, phone)
  • Orders & invoices
  • Support tickets / chat transcripts
  • Marketing preferences
  • Device identifiers (if collected)

Exclude

  • Internal employee notes
  • Aggregated analytics dashboards
  • Records containing other users’ identifiers
  • Internal security logs that expose system internals

Ambiguity rule example: If the system returns mixed‑user records → stop and escalate.

This keeps the agent from “expanding the mission.”

3. Tool boundaries (assume tools will return more than you asked for)

Even if you ask for allow‑listed fields, tools often return extra fields.
Your contract must explicitly state:

  • Allowed fields
  • Disallowed fields
  • What to do if disallowed fields appear

Example: CRM policy

AllowedDisallowed
name, email, phone, created_at, last_loginnotes, internal_tags, fraud_flags

Required behavior: If disallowed fields appear → discard + log the violation.

Example: Logs policy

  • Only return categories + date ranges + redacted snippets
  • Never return raw logs

Tool boundaries prevent accidental leakage and “quiet policy breakage.”

4. Output shape (make auditability default, not optional)

Most engineers constrain output, but the real win is making it audit‑friendly.

Example output skeleton (JSON)

{
  "identity_verification": {
    "method": "string",
    "confidence": 0.0,
    "matched_fields": ["field1", "field2"]
  },
  "data_found": [
    {
      "system": "string",
      "records_found": 0,
      "date_range": "YYYY‑MM‑DD to YYYY‑MM‑DD"
    }
  ],
  "redactions_applied": [
    {
      "field": "string",
      "reason": "string"
    }
  ],
  "deletion_plan": [
    {
      "item": "string",
      "dependencies": ["string"]
    }
  ],
  "user_summary": "Plain‑language summary of actions taken."
}

Now the agent produces an artifact that can be reviewed and diffed.

5. Stop rules (where reliability is earned)

Knowing when to halt is a senior‑level prompt design skill.

Examples

  • Identity confidence < 0.9 → stop; request one more proof.
  • Multiple matching user accounts → stop; escalate.
  • Deletion requested but no explicit confirm token → stop; ask user to confirm.
  • Any request expanding scope beyond requester identity → refuse.

Stop rules keep your system from “getting creative.”

Why this is hard (and why it’s worth showing)

This isn’t prompt‑writing as copywriting. It’s prompt‑writing as system design:

  1. Policies → constraints
  2. Constraints → predictable behavior
  3. Predictable behavior → auditable output

That’s the difference between an “agent that looks smart” and an “agent you can trust.”

The part that should be automated (without removing judgment)

The thinking stays human, but the repetitive scaffolding shouldn’t.

What’s worth automating

  • Generating instruction‑contract templates per workflow
  • Generating step‑specific prompts + output schemas
  • Enforcing consistent boundary rules across tools
  • Versioning + diffs (“what changed?”)
  • Running “golden request” regression checks

You still decide the rules. Automation just keeps the system consistent as you iterate.

Back to Blog

Related posts

Read more »