Prompt Unit Tests: 3 Bash Scripts That Catch Regressions Before Deploy
Source: Dev.to
Script 1: The Golden Output Test
This script sends a fixed input to your prompt and diffs the output against a known‑good response.
#!/bin/bash
# test-golden.sh — Compare prompt output against golden file
PROMPT_FILE="$1"
INPUT_FILE="$2"
GOLDEN_FILE="$3"
ACTUAL=$(cat "$PROMPT_FILE" "$INPUT_FILE" | \
curl -s https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d @- /tmp/prompt-test-actual.txt
if diff -q "$GOLDEN_FILE" /tmp/prompt-test-actual.txt > /dev/null 2>&1; then
echo "✅ PASS: Output matches golden file"
else
echo "❌ FAIL: Output diverged"
diff --color "$GOLDEN_FILE" /tmp/prompt-test-actual.txt
exit 1
fiUsage
./test-golden.sh prompts/summarize.txt fixtures/input-1.txt fixtures/expected-1.txtWhen to use
- Run after any prompt edit.
- Set
temperature: 0for deterministic output. - Update the golden file intentionally when you want the output to change.
Script 2: The Keyword Gate
Sometimes you don’t need an exact match — you just need the output to contain (or not contain) specific terms.
#!/bin/bash
# test-keywords.sh — Assert required/forbidden keywords in output
PROMPT_FILE="$1"
INPUT_FILE="$2"
REQUIRED="$3" # comma-separated: "function,return,async"
FORBIDDEN="$4" # comma-separated: "TODO,FIXME,undefined"
ACTUAL=$(curl -s https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"gpt-4o-mini\",
\"messages\": [
{\"role\": \"system\", \"content\": $(jq -Rs . /dev/null 2>&1; then
echo "✅ PASS: Valid JSON"
else
echo "❌ FAIL: Invalid JSON"
echo "$ACTUAL"
exit 1
fi
;;
has-headers)
if echo "$ACTUAL" | grep -q "^#"; then
echo "✅ PASS: Contains markdown headers"
else
echo "❌ FAIL: No markdown headers found"
exit 1
fi
;;
max-lines:*)
MAX="${FORMAT#max-lines:}"
LINES=$(echo "$ACTUAL" | wc -l)
if [ "$LINES" -le "$MAX" ]; then
echo "✅ PASS: $LINES lines (max: $MAX)"
else
echo "❌ FAIL: $LINES lines exceeds max $MAX"
exit 1
fi
;;
esacPutting It Together
I run all three in a Makefile:
test-prompts:
./test-golden.sh prompts/summarize.txt fixtures/doc-1.txt fixtures/expected-summary-1.txt
./test-keywords.sh prompts/review.txt fixtures/pr-1.txt "security,performance" "LGTM"
./test-format.sh prompts/extract.txt fixtures/email-1.txt jsonHook it into CI:
# .github/workflows/prompt-tests.yml
on:
push:
paths: ['prompts/**']
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: make test-prompts
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}Now every prompt change gets tested automatically. Total setup time: ~20 minutes. Regressions caught since I started: seven.
Your prompts are code. Test them like it.