Stop 'Vibe Checking' Your AI. Use Snapshot Testing Instead.

Published: 4 days ago (January 16, 2026 at 06:08 AM EST)

2 min read

Source: Dev.to

Why aren’t we doing this for AI?

Most of us are still “vibe checking”: manually running the prompt, reading the output, and saying, “Yeah, seems okay.”

I built a tool to fix this.

Introducing SafeStar

SafeStar is a zero‑dependency CLI tool that brings the “Snapshot & Diff” workflow to AI engineering. It works with Python, Node, curl, or any other interface, treating your AI as a black box and answering one question:

“Did the behavior change compared to last time?”

How it works

SafeStar follows a Git‑like workflow:

Snapshot a baseline of “good” behavior.
Run your current code.
Diff the results to detect drift.

Quick Start

You can try SafeStar right now without changing your code.

1. Install

npm install --save-dev safestar

2. Define a Scenario

Create a file scenarios/refund.yaml. Tell SafeStar how to run your script using the exec key.

name: refund_bot
prompt: "I want a refund immediately."

# Your actual code command
exec: "python3 my_agent.py"

# Run it 5 times to catch randomness/instability
runs: 5

# Simple guardrails
checks:
  max_length: 200
  must_not_contain:
    - "I am just an AI"

3. Create a Baseline

Run it until you get an output you like, then “freeze” it:

npx safestar baseline refund_bot

4. Check for Drift in CI

Whenever you change your prompt or model, run:

npx safestar diff scenarios/refund.yaml

If your model drifts, SafeStar alerts you:

--- SAFESTAR REPORT ---
Status: FAIL

Metrics:
  Avg Length: 45 chars -> 120 chars
  Drift:      +166% vs baseline (WARNING)
  Variance:   0.2 -> 9.8 (High instability)

Why I built this

I was tired of complex evaluation dashboards that give a “correctness score” of 87/100. I don’t care about the score; I care about regressions. If my bot was working yesterday, I just want to know if it is different today.

SafeStar is open source, local‑first, and fits right into GitHub Actions.

Stop 'Vibe Checking' Your AI. Use Snapshot Testing Instead.

Why aren’t we doing this for AI?

Introducing SafeStar

How it works

Quick Start

1. Install

2. Define a Scenario

3. Create a Baseline

4. Check for Drift in CI

Why I built this

Links

Related posts

Rapg: TUI-based Secret Manager

Quick Data Recovery using Snapshots - Amazon FSx for NetApp ONTAP

Technology is an Enabler, not a Saviour

Industry Survey: Faster Coding, Slower Debugging