I built a “deterministic” LLM text rephraser with a validation pipeline - looking for architectural feedback

Published: 3 days ago (February 9, 2026 at 01:16 AM EST)

4 min read

Source: Dev.to

Cover image for I built a “deterministic” LLM text rephraser with a validation pipeline - looking for architectural feedback

Most LLM apps that “rewrite text” are thin wrappers around an API call: you send text → you get text back. That works for demos, but it breaks down quickly when you need predictable behavior, quota enforcement, abuse resistance, and quality guarantees without storing user data.

I built a prototype called AI Text Rephrase to explore a question:

Can you make an LLM text‑transformation service behave like a deterministic backend system instead of a probabilistic chatbot?

The app is available at . This post focuses on the architecture and trade‑offs, not the product itself.

The core problem

LLM rewriting is non‑deterministic and unbounded by default:

Output style drifts
Sometimes it summarizes instead of rephrasing
Sometimes it changes meaning
Sometimes it ignores the requested tone
Sometimes it returns explanations, lists, or commentary
Sometimes it fails silently

If you expose this directly as an API, you get:

Inconsistent UX
Hard‑to‑debug failures
Quota abuse
Unpredictable cost
No way to enforce “this is a rephrase, not a rewrite”

Design principle

The LLM is not trusted. It is treated like an unreliable subsystem that must pass validation before its output is accepted.

Every request follows this flow:

Rate limit
Tier identification (anonymous vs. authenticated)
Quota check
Input validation (length bounds)
Text preprocessing
LLM inference (temperature = 0, single output)
Semantic validation
Tone adherence validation
Response assembly

If validation fails, inference is retried once; then the request fails. No heuristics, no “looks good” checks—pure thresholds.

Validation layer

After inference, three checks happen:

1. Semantic similarity check

Using sentence embeddings:

cosine_similarity(original, rephrased) >= THRESHOLD

If meaning drifts → reject.

2. Tone adherence check

Simple linguistic heuristics such as:

average word length
formality markers
structure patterns

If tone is wrong → reject.

3. Output format check

Length ratio must be within bounds. If the model summarizes or expands too much → reject.

These checks proved more effective than extensive prompt engineering.

Deterministic constraints (hard rules)

Very low temperature
Single output only
Fixed set of tones
Validation always enabled
No dynamic prompt mutation
Max 1 retry on failure

The goal is predictable behavior across requests.

Why SQLite?

I intentionally used SQLite because:

Single‑file persistence
No external DB required
Zero infrastructure overhead
Prototype constraint: single instance, single writer

The database stores only:

Users
Sessions
Quota counters
OTPs

It does not store input text, output text, or history, keeping the system stateless regarding content and simplifying privacy concerns.

API gateway before business logic

All cross‑cutting concerns live before the pipeline:

OTP authentication
Quota manager
Sliding‑window rate limiter
Request routing

The rephrase pipeline never knows who the user is; it only receives validated input. This separation made debugging and reasoning about failures much easier.

Minimal frontend

No framework, no build step—because the frontend is not the problem. The goal was to reduce moving parts and make Docker deployment trivial.

What this design prevents

Prompt injection via user text
Quota exhaustion by bots
Style drift
Meaning drift
Random output shapes
Cost spikes from multi‑output retries
Storing user content for debugging

The system behaves more like a compiler pipeline than a typical AI app.

Known limitations (by design)

SQLite single‑writer model → no horizontal scaling
In‑memory embedding model load at startup
No streaming responses
No rephrase history

All intentional for this stage.

Feedback sought

I’m not looking for UI or feature feedback. I’d love input from people who’ve built LLM systems on:

Is semantic + tone validation a reasonable guardrail, or would you do this differently?
Is “retry once then fail” the right trade‑off?
Would you move any validation before inference?
Is SQLite acceptable here given the constraints?
Any architectural smell in the pipeline separation?
How would you evolve this toward multi‑instance without breaking the design?

You can try the prototype here: .

Would really appreciate thoughts from folks working on LLM infra and backend systems.