프롬프트 단위 테스트: 배포 전에 회귀를 잡아내는 3개의 Bash 스크립트

발행: 1개월 전 (2026년 3월 31일 오후 08:36 GMT+9)

4 분 소요

원문: Dev.to

Source: Dev.to

Script 1: The Golden Output Test

이 스크립트는 고정된 입력을 프롬프트에 보내고, 출력이 알려진 정상 응답과 일치하는지 diff합니다.

#!/bin/bash
# test-golden.sh — Compare prompt output against golden file

PROMPT_FILE="$1"
INPUT_FILE="$2"
GOLDEN_FILE="$3"

ACTUAL=$(cat "$PROMPT_FILE" "$INPUT_FILE" | \
  curl -s https://api.openai.com/v1/chat/completions \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d @-  /tmp/prompt-test-actual.txt

if diff -q "$GOLDEN_FILE" /tmp/prompt-test-actual.txt > /dev/null 2>&1; then
  echo "✅ PASS: Output matches golden file"
else
  echo "❌ FAIL: Output diverged"
  diff --color "$GOLDEN_FILE" /tmp/prompt-test-actual.txt
  exit 1
fi

사용 방법

./test-golden.sh prompts/summarize.txt fixtures/input-1.txt fixtures/expected-1.txt

언제 사용하나요

프롬프트를 수정한 뒤마다 실행합니다.
결정적인 출력을 위해 temperature: 0을 설정합니다.
출력이 바뀌어야 할 경우, 의도적으로 골든 파일을 업데이트합니다.

Script 2: The Keyword Gate

정확히 일치할 필요는 없고, 출력에 특정 단어가 포함(또는 제외)되는지만 확인하고 싶을 때 사용합니다.

#!/bin/bash
# test-keywords.sh — Assert required/forbidden keywords in output

PROMPT_FILE="$1"
INPUT_FILE="$2"
REQUIRED="$3"   # comma-separated: "function,return,async"
FORBIDDEN="$4"  # comma-separated: "TODO,FIXME,undefined"

ACTUAL=$(curl -s https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d "{
    \"model\": \"gpt-4o-mini\",
    \"messages\": [
      {\"role\": \"system\", \"content\": $(jq -Rs .  /dev/null 2>&1; then
      echo "✅ PASS: Valid JSON"
    else
      echo "❌ FAIL: Invalid JSON"
      echo "$ACTUAL"
      exit 1
    fi
    ;;
  has-headers)
    if echo "$ACTUAL" | grep -q "^#"; then
      echo "✅ PASS: Contains markdown headers"
    else
      echo "❌ FAIL: No markdown headers found"
      exit 1
    fi
    ;;
  max-lines:*)
    MAX="${FORMAT#max-lines:}"
    LINES=$(echo "$ACTUAL" | wc -l)
    if [ "$LINES" -le "$MAX" ]; then
      echo "✅ PASS: $LINES lines (max: $MAX)"
    else
      echo "❌ FAIL: $LINES lines exceeds max $MAX"
      exit 1
    fi
    ;;
esac

Script 3: The Format Checker

출력이 특정 포맷(JSON, XML, CSV 등)인지 검증합니다.

#!/bin/bash
# test-format.sh — Validate output format (json|xml|csv)

PROMPT_FILE="$1"
INPUT_FILE="$2"
FORMAT="$3"   # e.g., json, xml, csv

ACTUAL=$(cat "$PROMPT_FILE" "$INPUT_FILE" | \
  curl -s https://api.openai.com/v1/chat/completions \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d @-)

case "$FORMAT" in
  json)
    if echo "$ACTUAL" | jq . > /dev/null 2>&1; then
      echo "✅ PASS: Valid JSON"
    else
      echo "❌ FAIL: Invalid JSON"
      echo "$ACTUAL"
      exit 1
    fi
    ;;
  xml)
    if echo "$ACTUAL" | xmllint --noout - > /dev/null 2>&1; then
      echo "✅ PASS: Valid XML"
    else
      echo "❌ FAIL: Invalid XML"
      echo "$ACTUAL"
      exit 1
    fi
    ;;
  csv)
    if echo "$ACTUAL" | csvtool -t ',' -u TAB -c 1- > /dev/null 2>&1; then
      echo "✅ PASS: Valid CSV"
    else
      echo "❌ FAIL: Invalid CSV"
      echo "$ACTUAL"
      exit 1
    fi
    ;;
  *)
    echo "❌ FAIL: Unknown format $FORMAT"
    exit 1
    ;;
esac

Putting It Together

세 개의 스크립트를 Makefile에 넣어 한 번에 실행합니다.

test-prompts:
	./test-golden.sh prompts/summarize.txt fixtures/doc-1.txt fixtures/expected-summary-1.txt
	./test-keywords.sh prompts/review.txt fixtures/pr-1.txt "security,performance" "LGTM"
	./test-format.sh prompts/extract.txt fixtures/email-1.txt json

CI에 연동하기:

# .github/workflows/prompt-tests.yml
on:
  push:
    paths: ['prompts/**']
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: make test-prompts
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

이제 프롬프트가 변경될 때마다 자동으로 테스트됩니다. 전체 설정 시간: 약 20 분. 시작 이후 잡힌 회귀: 7건.

프롬프트는 코드와 같습니다. 코드처럼 테스트하세요.

프롬프트 단위 테스트: 배포 전에 회귀를 잡아내는 3개의 Bash 스크립트

Script 1: The Golden Output Test

사용 방법

언제 사용하나요

Script 2: The Keyword Gate

Script 3: The Format Checker

Putting It Together

관련 글

왜 AI 에이전트는 규칙을 따르지 않는가 — Physical Governance의 사례

전국 로봇 주간 — 최신 Physical AI 연구, 돌파구 및 자료

압축 후 OpenClaw가 무엇을 하고 있는지 기억하도록 만들기

AI 도구에서 더 나은 결과 얻는 방법 (시간과 토큰을 낭비하지 않고)