Why We Banned LLMs from Runtime — And What We Do Instead
Source: Dev.to
Problems with LLMs at Runtime
Most AI backend tools use LLMs at runtime. Every API call triggers model inference, and every response is probabilistic. When an LLM processes your API request at runtime you get:
| Issue | Description |
|---|---|
| Non‑determinism | Same request → different response each time. |
| Latency | 800 ms – 3 s per request, depending on model load. |
| Cost | Per‑request inference cost that scales linearly with traffic. |
| Security | Prompt‑injection surface on every endpoint. |
| Auditability | “Why did the API return this?” → “The model decided.” |
For prototypes this is acceptable, but for production backends handling payments, reservations, and user data it is a structural risk.
Our Approach: Design‑time AI
Fascia uses AI exclusively at design time. You describe your business in natural language, and AI generates structured specifications (not code). These specs define:
- Entities – business objects with fields, relationships, state machines, and invariants.
- Tools – API endpoints with typed input/output, trigger types, and flow graphs.
- Policies – design‑time rules that block unsafe patterns before deployment.
At runtime a deterministic executor (written in Go, ~50 ms cold start on Cloud Run) reads the spec and follows it. No LLM inference, no variability.
Runtime Execution Flow
Every API endpoint follows the same deterministic sequence:
- Validate input – against JSON Schema from the spec.
- Authorize – JWT verification, RBAC role check, row‑level ownership.
- Check policies – design‑time rules enforced deterministically.
- Start transaction – explicit boundary, no auto‑commit.
- Execute flow graph – a DAG of typed nodes (Read, Write, Transform, If/Switch).
- Enforce invariants – business rules checked before commit.
- Commit or rollback – all‑or‑nothing, no partial state.
- Write audit log – append‑only, unconditional, every execution.
- Return typed response – matches the output schema from the spec.
No shortcuts. No “this endpoint is special.” The rigidity is the feature.
Safety Agent (Design‑time)
The Safety Agent runs during the design phase and performs:
- Multi‑model cross‑check – Claude + GPT‑4 (different model families).
- Static analysis of flow graphs for unsafe patterns.
- Risk classification – Green (safe), Yellow (warning), Red (blocked).
- Test case generation from spec invariants.
A Red risk blocks deployment with no override; the design must be fixed.
Red Pattern Examples
| Pattern | Why it’s blocked |
|---|---|
| Payment call inside a transaction boundary | If the transaction rolls back, the payment can’t be undone. |
UPDATE without WHERE clause | May affect unintended rows. |
| Write without transaction boundary | Leads to partial state on failure. |
| Hard delete instead of soft delete | Irreversible data loss. |
Design‑time Constraints
All intelligence must be captured in the spec at design time. The runtime cannot “think.” This means:
- Complex conditional logic → modeled as flow‑graph branches.
- Custom business rules → expressed in a restricted Value DSL (no arbitrary code).
- External API calls → explicit nodes with retry/timeout configuration.
These constraints are intentional; they make production backends provable, not probabilistic.
Metrics Comparison
| Metric | LLM‑at‑runtime | Spec‑driven (Fascia) |
|---|---|---|
| Response time | 800 ms – 3 s (variable) | 12 – 50 ms (consistent) |
| Determinism | No | Yes |
| Prompt injection risk | Every endpoint | Zero (no prompts at runtime) |
| Per‑request LLM cost | Yes | Zero |
| Audit trail | “Model decided” | Spec version + execution log |
Roadmap
We’re building Fascia in public. Pre‑launch, solo founder, 150+ PRs deep.
Next in this series: The Risk Engine – How We Classify Green, Yellow, and Red.