PromptLedger v0.3 — Turning prompt history into a practical review workflow.
Source: Dev.to
Devlog — Part 3
Turning prompt history into a practical review workflow
In Part 1, I introduced PromptLedger as a deliberately small, local‑first tool for treating prompts like code.
In Part 2, I added release semantics: labels, label history, and status views that made it easier to answer questions like “what is in production right now?”
With v0.3, the next question became harder:
Even if I can diff two prompt versions, can I review them in a way that feels closer to a real release workflow?
That is the focus of this release.
What’s new in v0.3?
PromptLedger v0.3 adds a small but practical Prompt Review layer on top of the existing history model—while still staying local‑first, SQLite‑backed, and intentionally limited in scope.
After the release‑semantics work in v0.2, the project could already answer questions like:
- Which prompt does prod currently point to?
- When was that label changed?
- How does prod differ from staging?
But another gap became obvious. A raw diff is useful, yet in practice people often want a slightly higher‑level review:
- Did the prompt become stricter?
- Did the tone change?
- Was the output format changed from bullets to JSON?
- Did safety or refusal wording get stronger or weaker?
- Is this a release change or a likely regression risk?
These are review questions, not execution or observability questions.
So instead of adding prompt execution, external APIs, or any hosted layer, I kept the project focused and added a review workflow built entirely on top of the existing local data.
New CLI Commands
promptledger review
promptledger review --id onboarding --from prod --to stagingCompares two refs (versions or labels) and produces a structured review output that includes:
- Resolved refs and versions
- A semantic summary
- Metadata changes
- Label context
- Warning flags
- A few conservative notes
This is deliberately not an evaluation system. It does not score prompts, call a model, or guess too much. It simply makes a prompt diff easier to interpret.
Traditional diffs are still useful, and PromptLedger keeps all previous diff modes.
v0.3 adds a new summary‑oriented mode:
promptledger diff --id onboarding --from 7 --to 9 --mode summaryThis produces a heuristic, rule‑based semantic summary instead of a raw line diff.
Design Goals for the Summary
- Local – no network calls
- Deterministic – same input → same output
- Transparent – rules are visible in the source
- Intentionally conservative – only says something when the change is clear enough
Current summary categories include:
- Tone changes
- Tighter or looser constraints
- Output format changes
- Broader vs. more specific prompts
- Safety wording changes
- Length requirement changes
- Refusal or policy wording changes
The summary is not meant to replace reading the actual prompt. Using an external model for review would introduce network dependence, nondeterministic behavior, more configuration, harder testing, and less trust in the output—exactly the opposite of PromptLedger’s philosophy.
Exporting Reviews
Another practical gap was sharing review output. Reading a diff in the terminal is fine, but you often need a portable document.
promptledger export review \
--id onboarding \
--from prod \
--to staging \
--format md \
--out review.mdThe exported Markdown is deterministic and structured, containing:
- Title
- Compared refs
- Semantic summary
- Text‑diff note
- Metadata changes
- Warnings
- Label information
- A reviewer‑notes placeholder
This makes PromptLedger more useful in real workflows without adding any collaboration backend—the file is still just a file.
Metadata‑Aware Reviews
Prompt text is only part of the story. A release change may also involve metadata updates such as:
reasonauthortagsenvmetrics
Earlier versions could already diff metadata, but v0.3 makes metadata changes part of the review object itself. This matters because some changes are metadata‑only.
Warning Flags
v0.3 adds simple warning flags for cases such as:
- Comparing the same version to itself
- Environment changes
- Metadata‑only changes
- Policy or refusal wording changes that may affect behavior drift
These warnings are not dramatic; for example, a wording change around refusal or safety does not automatically mean the prompt got worse, but it probably means a reviewer should read it more carefully.
API Improvements
The review workflow is not just a CLI feature. The Python API now exposes review results as structured domain objects rather than just formatted strings. Callers can programmatically access:
- Resolved refs
- Semantic summary items
- Metadata changes
- Warnings
- Notes
- Label context
This keeps the CLI and the API aligned while also making formatting a separate concern. The separation turned out to be one of the cleaner changes in this version:
- Review logic lives in one place
- Rendering logic lives elsewhere
- Markdown export and terminal rendering both use the same review result
UI Updates
The Streamlit UI remains read‑only, but the comparison view now surfaces review information more clearly:
- Semantic summary
- Warnings
- Metadata diff
- Side‑by‑side prompt comparison
- Line diff
This keeps the UI aligned with the CLI review flow without turning it into an editor—the constraint still matters.
What didn’t change
Just as important as the new features is what was left out. v0.3 does not add:
- A hosted registry
- Prompt execution APIs
- Agent tooling
- Telemetry pipelines
- Tracing dashboards
- Cloud sync
- Automatic scoring
- Evaluation harnesses
There are already plenty of tools going in those directions. PromptLedger is still trying to do one narrower thing well:
Store, version, and review prompts locally—nothing more, nothing less.
Release Highlights
- Review workflow – No need to turn the database into something more complicated.
- SQLite remains the single source of truth, keeping the implementation smaller and the migration story simpler.
- Not every useful feature requires a bigger schema.
v0.3 Overview
The release did not try to make PromptLedger smarter in a flashy way; it stays more reviewable.
The result is still a local tool, but now it is easier to answer a more realistic question:
“What changed?” → “How should I review this change before I move it forward?”
This is a better place for the project to be.