I built a “deterministic” LLM text rephraser with a validation pipeline - looking for architectural feedback

Published: (February 9, 2026 at 01:16 AM EST)
4 min read
Source: Dev.to

Source: Dev.to

Cover image for I built a “deterministic” LLM text rephraser with a validation pipeline - looking for architectural feedback

Most LLM apps that “rewrite text” are thin wrappers around an API call: you send text → you get text back. That works for demos, but it breaks down quickly when you need predictable behavior, quota enforcement, abuse resistance, and quality guarantees without storing user data.

I built a prototype called AI Text Rephrase to explore a question:

Can you make an LLM text‑transformation service behave like a deterministic backend system instead of a probabilistic chatbot?

The app is available at . This post focuses on the architecture and trade‑offs, not the product itself.

The core problem

LLM rewriting is non‑deterministic and unbounded by default:

  • Output style drifts
  • Sometimes it summarizes instead of rephrasing
  • Sometimes it changes meaning
  • Sometimes it ignores the requested tone
  • Sometimes it returns explanations, lists, or commentary
  • Sometimes it fails silently

If you expose this directly as an API, you get:

  • Inconsistent UX
  • Hard‑to‑debug failures
  • Quota abuse
  • Unpredictable cost
  • No way to enforce “this is a rephrase, not a rewrite”

Design principle

The LLM is not trusted. It is treated like an unreliable subsystem that must pass validation before its output is accepted.

Every request follows this flow:

  1. Rate limit
  2. Tier identification (anonymous vs. authenticated)
  3. Quota check
  4. Input validation (length bounds)
  5. Text preprocessing
  6. LLM inference (temperature = 0, single output)
  7. Semantic validation
  8. Tone adherence validation
  9. Response assembly

If validation fails, inference is retried once; then the request fails. No heuristics, no “looks good” checks—pure thresholds.

Validation layer

After inference, three checks happen:

1. Semantic similarity check

Using sentence embeddings:

cosine_similarity(original, rephrased) >= THRESHOLD

If meaning drifts → reject.

2. Tone adherence check

Simple linguistic heuristics such as:

  • average word length
  • formality markers
  • structure patterns

If tone is wrong → reject.

3. Output format check

Length ratio must be within bounds. If the model summarizes or expands too much → reject.

These checks proved more effective than extensive prompt engineering.

Deterministic constraints (hard rules)

  • Very low temperature
  • Single output only
  • Fixed set of tones
  • Validation always enabled
  • No dynamic prompt mutation
  • Max 1 retry on failure

The goal is predictable behavior across requests.

Why SQLite?

I intentionally used SQLite because:

  • Single‑file persistence
  • No external DB required
  • Zero infrastructure overhead
  • Prototype constraint: single instance, single writer

The database stores only:

  • Users
  • Sessions
  • Quota counters
  • OTPs

It does not store input text, output text, or history, keeping the system stateless regarding content and simplifying privacy concerns.

API gateway before business logic

All cross‑cutting concerns live before the pipeline:

  • OTP authentication
  • Quota manager
  • Sliding‑window rate limiter
  • Request routing

The rephrase pipeline never knows who the user is; it only receives validated input. This separation made debugging and reasoning about failures much easier.

Minimal frontend

No framework, no build step—because the frontend is not the problem. The goal was to reduce moving parts and make Docker deployment trivial.

What this design prevents

  • Prompt injection via user text
  • Quota exhaustion by bots
  • Style drift
  • Meaning drift
  • Random output shapes
  • Cost spikes from multi‑output retries
  • Storing user content for debugging

The system behaves more like a compiler pipeline than a typical AI app.

Known limitations (by design)

  • SQLite single‑writer model → no horizontal scaling
  • In‑memory embedding model load at startup
  • No streaming responses
  • No rephrase history

All intentional for this stage.

Feedback sought

I’m not looking for UI or feature feedback. I’d love input from people who’ve built LLM systems on:

  • Is semantic + tone validation a reasonable guardrail, or would you do this differently?
  • Is “retry once then fail” the right trade‑off?
  • Would you move any validation before inference?
  • Is SQLite acceptable here given the constraints?
  • Any architectural smell in the pipeline separation?
  • How would you evolve this toward multi‑instance without breaking the design?

You can try the prototype here: .

Would really appreciate thoughts from folks working on LLM infra and backend systems.

0 views
Back to Blog

Related posts

Read more »

Building My First AI Agent

Introduction AI agents have become increasingly prominent in today’s technology sector, and their momentum shows no signs of slowing. They are now an integral...