I built a “deterministic” LLM text rephraser with a validation pipeline - looking for architectural feedback
Source: Dev.to

Most LLM apps that “rewrite text” are thin wrappers around an API call: you send text → you get text back. That works for demos, but it breaks down quickly when you need predictable behavior, quota enforcement, abuse resistance, and quality guarantees without storing user data.
I built a prototype called AI Text Rephrase to explore a question:
Can you make an LLM text‑transformation service behave like a deterministic backend system instead of a probabilistic chatbot?
The app is available at . This post focuses on the architecture and trade‑offs, not the product itself.
The core problem
LLM rewriting is non‑deterministic and unbounded by default:
- Output style drifts
- Sometimes it summarizes instead of rephrasing
- Sometimes it changes meaning
- Sometimes it ignores the requested tone
- Sometimes it returns explanations, lists, or commentary
- Sometimes it fails silently
If you expose this directly as an API, you get:
- Inconsistent UX
- Hard‑to‑debug failures
- Quota abuse
- Unpredictable cost
- No way to enforce “this is a rephrase, not a rewrite”
Design principle
The LLM is not trusted. It is treated like an unreliable subsystem that must pass validation before its output is accepted.
Every request follows this flow:
- Rate limit
- Tier identification (anonymous vs. authenticated)
- Quota check
- Input validation (length bounds)
- Text preprocessing
- LLM inference (
temperature = 0, single output) - Semantic validation
- Tone adherence validation
- Response assembly
If validation fails, inference is retried once; then the request fails. No heuristics, no “looks good” checks—pure thresholds.
Validation layer
After inference, three checks happen:
1. Semantic similarity check
Using sentence embeddings:
cosine_similarity(original, rephrased) >= THRESHOLD
If meaning drifts → reject.
2. Tone adherence check
Simple linguistic heuristics such as:
- average word length
- formality markers
- structure patterns
If tone is wrong → reject.
3. Output format check
Length ratio must be within bounds. If the model summarizes or expands too much → reject.
These checks proved more effective than extensive prompt engineering.
Deterministic constraints (hard rules)
- Very low temperature
- Single output only
- Fixed set of tones
- Validation always enabled
- No dynamic prompt mutation
- Max 1 retry on failure
The goal is predictable behavior across requests.
Why SQLite?
I intentionally used SQLite because:
- Single‑file persistence
- No external DB required
- Zero infrastructure overhead
- Prototype constraint: single instance, single writer
The database stores only:
- Users
- Sessions
- Quota counters
- OTPs
It does not store input text, output text, or history, keeping the system stateless regarding content and simplifying privacy concerns.
API gateway before business logic
All cross‑cutting concerns live before the pipeline:
- OTP authentication
- Quota manager
- Sliding‑window rate limiter
- Request routing
The rephrase pipeline never knows who the user is; it only receives validated input. This separation made debugging and reasoning about failures much easier.
Minimal frontend
No framework, no build step—because the frontend is not the problem. The goal was to reduce moving parts and make Docker deployment trivial.
What this design prevents
- Prompt injection via user text
- Quota exhaustion by bots
- Style drift
- Meaning drift
- Random output shapes
- Cost spikes from multi‑output retries
- Storing user content for debugging
The system behaves more like a compiler pipeline than a typical AI app.
Known limitations (by design)
- SQLite single‑writer model → no horizontal scaling
- In‑memory embedding model load at startup
- No streaming responses
- No rephrase history
All intentional for this stage.
Feedback sought
I’m not looking for UI or feature feedback. I’d love input from people who’ve built LLM systems on:
- Is semantic + tone validation a reasonable guardrail, or would you do this differently?
- Is “retry once then fail” the right trade‑off?
- Would you move any validation before inference?
- Is SQLite acceptable here given the constraints?
- Any architectural smell in the pipeline separation?
- How would you evolve this toward multi‑instance without breaking the design?
You can try the prototype here: .
Would really appreciate thoughts from folks working on LLM infra and backend systems.