Prompt Unit Tests: 3 Bash Scripts That Catch Regressions Before Deploy

Published: (March 31, 2026 at 07:36 AM EDT)
2 min read
Source: Dev.to

Source: Dev.to

Script 1: The Golden Output Test

This script sends a fixed input to your prompt and diffs the output against a known‑good response.

#!/bin/bash
# test-golden.sh — Compare prompt output against golden file

PROMPT_FILE="$1"
INPUT_FILE="$2"
GOLDEN_FILE="$3"

ACTUAL=$(cat "$PROMPT_FILE" "$INPUT_FILE" | \
  curl -s https://api.openai.com/v1/chat/completions \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d @-  /tmp/prompt-test-actual.txt

if diff -q "$GOLDEN_FILE" /tmp/prompt-test-actual.txt > /dev/null 2>&1; then
  echo "✅ PASS: Output matches golden file"
else
  echo "❌ FAIL: Output diverged"
  diff --color "$GOLDEN_FILE" /tmp/prompt-test-actual.txt
  exit 1
fi

Usage

./test-golden.sh prompts/summarize.txt fixtures/input-1.txt fixtures/expected-1.txt

When to use

  • Run after any prompt edit.
  • Set temperature: 0 for deterministic output.
  • Update the golden file intentionally when you want the output to change.

Script 2: The Keyword Gate

Sometimes you don’t need an exact match — you just need the output to contain (or not contain) specific terms.

#!/bin/bash
# test-keywords.sh — Assert required/forbidden keywords in output

PROMPT_FILE="$1"
INPUT_FILE="$2"
REQUIRED="$3"   # comma-separated: "function,return,async"
FORBIDDEN="$4"  # comma-separated: "TODO,FIXME,undefined"

ACTUAL=$(curl -s https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"gpt-4o-mini\",
    \"messages\": [
      {\"role\": \"system\", \"content\": $(jq -Rs .  /dev/null 2>&1; then
      echo "✅ PASS: Valid JSON"
    else
      echo "❌ FAIL: Invalid JSON"
      echo "$ACTUAL"
      exit 1
    fi
    ;;
  has-headers)
    if echo "$ACTUAL" | grep -q "^#"; then
      echo "✅ PASS: Contains markdown headers"
    else
      echo "❌ FAIL: No markdown headers found"
      exit 1
    fi
    ;;
  max-lines:*)
    MAX="${FORMAT#max-lines:}"
    LINES=$(echo "$ACTUAL" | wc -l)
    if [ "$LINES" -le "$MAX" ]; then
      echo "✅ PASS: $LINES lines (max: $MAX)"
    else
      echo "❌ FAIL: $LINES lines exceeds max $MAX"
      exit 1
    fi
    ;;
esac

Putting It Together

I run all three in a Makefile:

test-prompts:
	./test-golden.sh prompts/summarize.txt fixtures/doc-1.txt fixtures/expected-summary-1.txt
	./test-keywords.sh prompts/review.txt fixtures/pr-1.txt "security,performance" "LGTM"
	./test-format.sh prompts/extract.txt fixtures/email-1.txt json

Hook it into CI:

# .github/workflows/prompt-tests.yml
on:
  push:
    paths: ['prompts/**']
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: make test-prompts
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Now every prompt change gets tested automatically. Total setup time: ~20 minutes. Regressions caught since I started: seven.

Your prompts are code. Test them like it.

0 views
Back to Blog

Related posts

Read more »

I Know It’s AI, But It Still Feels Real

Lately, I’ve been thinking about how we talk to AI—not just for code or answers, but for understanding, comfort, and something that feels a little more human. A...