Why We Banned LLMs from Runtime — And What We Do Instead

Published: (February 25, 2026 at 01:51 PM EST)
4 min read
Source: Dev.to

Source: Dev.to

Problems with LLMs at Runtime

Most AI backend tools use LLMs at runtime. Every API call triggers model inference, and every response is probabilistic. When an LLM processes your API request at runtime you get:

IssueDescription
Non‑determinismSame request → different response each time.
Latency800 ms – 3 s per request, depending on model load.
CostPer‑request inference cost that scales linearly with traffic.
SecurityPrompt‑injection surface on every endpoint.
Auditability“Why did the API return this?” → “The model decided.”

For prototypes this is acceptable, but for production backends handling payments, reservations, and user data it is a structural risk.

Our Approach: Design‑time AI

Fascia uses AI exclusively at design time. You describe your business in natural language, and AI generates structured specifications (not code). These specs define:

  • Entities – business objects with fields, relationships, state machines, and invariants.
  • Tools – API endpoints with typed input/output, trigger types, and flow graphs.
  • Policies – design‑time rules that block unsafe patterns before deployment.

At runtime a deterministic executor (written in Go, ~50 ms cold start on Cloud Run) reads the spec and follows it. No LLM inference, no variability.

Runtime Execution Flow

Every API endpoint follows the same deterministic sequence:

  1. Validate input – against JSON Schema from the spec.
  2. Authorize – JWT verification, RBAC role check, row‑level ownership.
  3. Check policies – design‑time rules enforced deterministically.
  4. Start transaction – explicit boundary, no auto‑commit.
  5. Execute flow graph – a DAG of typed nodes (Read, Write, Transform, If/Switch).
  6. Enforce invariants – business rules checked before commit.
  7. Commit or rollback – all‑or‑nothing, no partial state.
  8. Write audit log – append‑only, unconditional, every execution.
  9. Return typed response – matches the output schema from the spec.

No shortcuts. No “this endpoint is special.” The rigidity is the feature.

Safety Agent (Design‑time)

The Safety Agent runs during the design phase and performs:

  • Multi‑model cross‑check – Claude + GPT‑4 (different model families).
  • Static analysis of flow graphs for unsafe patterns.
  • Risk classification – Green (safe), Yellow (warning), Red (blocked).
  • Test case generation from spec invariants.

A Red risk blocks deployment with no override; the design must be fixed.

Red Pattern Examples

PatternWhy it’s blocked
Payment call inside a transaction boundaryIf the transaction rolls back, the payment can’t be undone.
UPDATE without WHERE clauseMay affect unintended rows.
Write without transaction boundaryLeads to partial state on failure.
Hard delete instead of soft deleteIrreversible data loss.

Design‑time Constraints

All intelligence must be captured in the spec at design time. The runtime cannot “think.” This means:

  • Complex conditional logic → modeled as flow‑graph branches.
  • Custom business rules → expressed in a restricted Value DSL (no arbitrary code).
  • External API calls → explicit nodes with retry/timeout configuration.

These constraints are intentional; they make production backends provable, not probabilistic.

Metrics Comparison

MetricLLM‑at‑runtimeSpec‑driven (Fascia)
Response time800 ms – 3 s (variable)12 – 50 ms (consistent)
DeterminismNoYes
Prompt injection riskEvery endpointZero (no prompts at runtime)
Per‑request LLM costYesZero
Audit trail“Model decided”Spec version + execution log

Roadmap

We’re building Fascia in public. Pre‑launch, solo founder, 150+ PRs deep.

Next in this series: The Risk Engine – How We Classify Green, Yellow, and Red.

fascia.run

0 views
Back to Blog

Related posts

Read more »

[Boost]

Profile !Vincent A. Cicirellohttps://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaw...