Prompt Unit Tests:3 个 Bash 脚本在部署前捕获回归
发布: (2026年3月31日 GMT+8 19:36)
4 分钟阅读
原文: Dev.to
Source: Dev.to
脚本 1:黄金输出测试
此脚本向你的提示发送固定输入,并将输出与已知的正确响应进行比较。
#!/bin/bash
# test-golden.sh — Compare prompt output against golden file
PROMPT_FILE="$1"
INPUT_FILE="$2"
GOLDEN_FILE="$3"
ACTUAL=$(cat "$PROMPT_FILE" "$INPUT_FILE" | \
curl -s https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d @- /tmp/prompt-test-actual.txt
if diff -q "$GOLDEN_FILE" /tmp/prompt-test-actual.txt > /dev/null 2>&1; then
echo "✅ PASS: Output matches golden file"
else
echo "❌ FAIL: Output diverged"
diff --color "$GOLDEN_FILE" /tmp/prompt-test-actual.txt
exit 1
fi使用方法
./test-golden.sh prompts/summarize.txt fixtures/input-1.txt fixtures/expected-1.txt何时使用
- 在任何提示编辑后运行。
- 将
temperature: 0设置为确定性输出。 - 当你 希望 输出改变时,手动更新黄金文件。
脚本 2:关键词门
有时你不需要完全匹配——只需要输出包含(或 不 包含)特定词汇。
#!/bin/bash
# test-keywords.sh — Assert required/forbidden keywords in output
PROMPT_FILE="$1"
INPUT_FILE="$2"
REQUIRED="$3" # comma-separated: "function,return,async"
FORBIDDEN="$4" # comma-separated: "TODO,FIXME,undefined"
ACTUAL=$(curl -s https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"model\": \"gpt-4o-mini\",
\"messages\": [
{\"role\": \"system\", \"content\": $(jq -Rs . /dev/null 2>&1; then
echo "✅ PASS: Valid JSON"
else
echo "❌ FAIL: Invalid JSON"
echo "$ACTUAL"
exit 1
fi
;;
has-headers)
if echo "$ACTUAL" | grep -q "^#"; then
echo "✅ PASS: Contains markdown headers"
else
echo "❌ FAIL: No markdown headers found"
exit 1
fi
;;
max-lines:*)
MAX="${FORMAT#max-lines:}"
LINES=$(echo "$ACTUAL" | wc -l)
if [ "$LINES" -le "$MAX" ]; then
echo "✅ PASS: $LINES lines (max: $MAX)"
else
echo "❌ FAIL: $LINES lines exceeds max $MAX"
exit 1
fi
;;
esac脚本 3:格式检查
此脚本验证输出是否符合特定格式(JSON、YAML、Markdown 等),并可检查标题、行数等属性。
#!/bin/bash
# test-format.sh — Verify output format and optional constraints
PROMPT_FILE="$1"
INPUT_FILE="$2"
FORMAT="$3" # json|yaml|markdown|has-headers|max-lines:NN
ACTUAL=$(cat "$PROMPT_FILE" "$INPUT_FILE" | \
curl -s https://api.openai.com/v1/chat/completions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d @-)
case "$FORMAT" in
json)
if echo "$ACTUAL" | jq . > /dev/null 2>&1; then
echo "✅ PASS: Valid JSON"
else
echo "❌ FAIL: Invalid JSON"
echo "$ACTUAL"
exit 1
fi
;;
yaml)
if echo "$ACTUAL" | python -c "import sys, yaml; yaml.safe_load(sys.stdin)" > /dev/null 2>&1; then
echo "✅ PASS: Valid YAML"
else
echo "❌ FAIL: Invalid YAML"
echo "$ACTUAL"
exit 1
fi
;;
markdown)
if echo "$ACTUAL" | grep -q "^#"; then
echo "✅ PASS: Contains markdown headers"
else
echo "❌ FAIL: No markdown headers found"
exit 1
fi
;;
has-headers)
if echo "$ACTUAL" | grep -q "^#"; then
echo "✅ PASS: Contains markdown headers"
else
echo "❌ FAIL: No markdown headers found"
exit 1
fi
;;
max-lines:*)
MAX="${FORMAT#max-lines:}"
LINES=$(echo "$ACTUAL" | wc -l)
if [ "$LINES" -le "$MAX" ]; then
echo "✅ PASS: $LINES lines (max: $MAX)"
else
echo "❌ FAIL: $LINES lines exceeds max $MAX"
exit 1
fi
;;
esac综合使用
我在 Makefile 中一次性运行这三个脚本:
test-prompts:
./test-golden.sh prompts/summarize.txt fixtures/doc-1.txt fixtures/expected-summary-1.txt
./test-keywords.sh prompts/review.txt fixtures/pr-1.txt "security,performance" "LGTM"
./test-format.sh prompts/extract.txt fixtures/email-1.txt json将其挂到 CI:
# .github/workflows/prompt-tests.yml
on:
push:
paths: ['prompts/**']
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: make test-prompts
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}现在每一次提示的更改都会自动进行测试。整体搭建时间约为 20 分钟。自从开始使用以来已捕获七次回归。
你的提示就是代码。像对待代码一样对它们进行测试吧。