Why We Banned LLMs from Runtime — And What We Do Instead

Published: 2 months ago (February 25, 2026 at 01:51 PM EST)

4 min read

Source: Dev.to

Source: Dev.to

Problems with LLMs at Runtime

Most AI backend tools use LLMs at runtime. Every API call triggers model inference, and every response is probabilistic. When an LLM processes your API request at runtime you get:

Issue	Description
Non‑determinism	Same request → different response each time.
Latency	800 ms – 3 s per request, depending on model load.
Cost	Per‑request inference cost that scales linearly with traffic.
Security	Prompt‑injection surface on every endpoint.
Auditability	“Why did the API return this?” → “The model decided.”

For prototypes this is acceptable, but for production backends handling payments, reservations, and user data it is a structural risk.

Our Approach: Design‑time AI

Fascia uses AI exclusively at design time. You describe your business in natural language, and AI generates structured specifications (not code). These specs define:

Entities – business objects with fields, relationships, state machines, and invariants.
Tools – API endpoints with typed input/output, trigger types, and flow graphs.
Policies – design‑time rules that block unsafe patterns before deployment.

At runtime a deterministic executor (written in Go, ~50 ms cold start on Cloud Run) reads the spec and follows it. No LLM inference, no variability.

Runtime Execution Flow

Every API endpoint follows the same deterministic sequence:

Validate input – against JSON Schema from the spec.
Authorize – JWT verification, RBAC role check, row‑level ownership.
Check policies – design‑time rules enforced deterministically.
Start transaction – explicit boundary, no auto‑commit.
Execute flow graph – a DAG of typed nodes (Read, Write, Transform, If/Switch).
Enforce invariants – business rules checked before commit.
Commit or rollback – all‑or‑nothing, no partial state.
Write audit log – append‑only, unconditional, every execution.
Return typed response – matches the output schema from the spec.

No shortcuts. No “this endpoint is special.” The rigidity is the feature.

Safety Agent (Design‑time)

The Safety Agent runs during the design phase and performs:

Multi‑model cross‑check – Claude + GPT‑4 (different model families).
Static analysis of flow graphs for unsafe patterns.
Risk classification – Green (safe), Yellow (warning), Red (blocked).
Test case generation from spec invariants.

A Red risk blocks deployment with no override; the design must be fixed.

Red Pattern Examples

Pattern	Why it’s blocked
Payment call inside a transaction boundary	If the transaction rolls back, the payment can’t be undone.
`UPDATE` without `WHERE` clause	May affect unintended rows.
Write without transaction boundary	Leads to partial state on failure.
Hard delete instead of soft delete	Irreversible data loss.

Design‑time Constraints

All intelligence must be captured in the spec at design time. The runtime cannot “think.” This means:

Complex conditional logic → modeled as flow‑graph branches.
Custom business rules → expressed in a restricted Value DSL (no arbitrary code).
External API calls → explicit nodes with retry/timeout configuration.

These constraints are intentional; they make production backends provable, not probabilistic.

Metrics Comparison

Metric	LLM‑at‑runtime	Spec‑driven (Fascia)
Response time	800 ms – 3 s (variable)	12 – 50 ms (consistent)
Determinism	No	Yes
Prompt injection risk	Every endpoint	Zero (no prompts at runtime)
Per‑request LLM cost	Yes	Zero
Audit trail	“Model decided”	Spec version + execution log

Roadmap

We’re building Fascia in public. Pre‑launch, solo founder, 150+ PRs deep.

Next in this series: The Risk Engine – How We Classify Green, Yellow, and Red.

fascia.run

Why We Banned LLMs from Runtime — And What We Do Instead

Problems with LLMs at Runtime

Our Approach: Design‑time AI

Runtime Execution Flow

Safety Agent (Design‑time)

Red Pattern Examples

Design‑time Constraints

Metrics Comparison

Roadmap

Related posts

Stop Queuing Inference Requests

The 3-Layer Architecture That Keeps My AI Business Running

Self-Hosting Remote VSCode with Cloudflare Tunnel and Authentik SSO

The AI Infrastructure Decision Matrix: Build vs. Buy in 2026