[Paper] FASTRIC: Prompt Specification Language for Verifiable LLM Interactions

Published: 1 week ago (December 21, 2025 at 08:19 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.18940v1

Overview

The paper introduces FASTRIC, a Prompt Specification Language that makes the hidden finite‑state‑machine (FSM) logic of multi‑turn LLM interactions explicit in natural‑language prompts. By turning prompts into verifiable specifications, designers can check whether an LLM’s behavior actually follows the intended protocol, moving prompt engineering from a trial‑and‑error art toward a disciplined engineering practice.

Key Contributions

FASTRIC language – a human‑readable syntax that captures the seven core FSM elements (final states, agents, states, triggers, roles, initial state, constraints).
Unified LLM‑driven toolchain – the same LLM parses, interprets, and executes the specification, eliminating the need for separate parsers or runtimes.
Procedural conformance metric – a quantitative measure of how closely an execution trace matches the declared FSM.
Formality spectrum – FASTRIC supports specifications ranging from loosely described (implicit) to fully explicit step‑by‑step instructions, letting designers tune “prompt formalism”.
Empirical “Goldilocks zone” study – experiments across three model sizes (14.7 B, 685 B, 1 T+) and four formality levels reveal model‑specific sweet spots where specifications improve conformance without over‑constraining the model.
Foundations for Prompt Specification Engineering – establishes a repeatable workflow for building verifiable, multi‑turn interaction protocols.

Methodology

Specification Design – Authors defined a template that forces designers to enumerate the seven FSM components. The template can be filled out informally (letting the model infer missing pieces) or formally (listing every transition and constraint).
LLM as Execution Agent – The same LLM receives the FASTRIC prompt, parses the FSM description internally, and then carries out the multi‑turn dialogue as the designated “agent”. No external parser or state engine is used.
Trace Collection – For each run, the full conversation (prompt, model replies, any tool calls) is recorded as an execution trace.
Conformance Evaluation – A post‑hoc script (also powered by an LLM) checks the trace against the original FSM: it verifies that every trigger leads to the correct next state, that final states are reached appropriately, and that constraints are never violated. The result is a procedural conformance score between 0 and 1.
Experimental Grid – The authors tested a simple 3‑state “kindergarten tutoring” FSM at four specification formality levels (L1–L4) on three model families: Phi‑4 (14.7 B), DeepSeek‑V3.2 (685 B), and ChatGPT‑5 (~1 T). Each configuration was run multiple times to capture variance.

Results & Findings

Model	Best Formality Level	Peak Conformance	Notable Trend
Phi‑4 (14.7 B)	None stable (high variance)	≈0.55 ± 0.30	Conformance fluctuates; no clear “Goldilocks” zone.
DeepSeek‑V3.2 (685 B)	L2–L4 (more explicit)	1.00	Perfect adherence when given enough structure.
ChatGPT‑5 (~1 T)	L3 (moderately explicit)	0.90	Peaks at medium formality; over‑specifying (L4) drops to 0.39.

Key takeaways

Model capacity matters: Larger models tolerate more explicit specifications, but beyond a certain point the extra constraints confuse them.
Goldilocks zones: Each model has a narrow band of specification formalism that maximizes conformance.
Variance in small models: Low‑capacity models show unstable behavior, suggesting they need either very minimal prompts or additional external tooling.

Practical Implications

Design‑time verification: Developers can write FASTRIC prompts for chatbots, tutoring agents, or workflow assistants and automatically obtain a conformance score before deployment.
Safety & compliance: In regulated domains (e.g., finance, healthcare), FASTRIC can serve as a lightweight contract that the LLM must obey, providing audit trails for compliance officers.
Prompt engineering tooling: IDE extensions could auto‑generate FASTRIC skeletons, highlight missing FSM elements, and suggest the optimal formality level based on the target model.
Model selection guidance: When building a product that relies on multi‑turn protocols, teams can use the Goldilocks findings to pick a model whose capacity matches the desired specification granularity.
Reduced debugging time: Instead of manually inspecting dialogue failures, developers can run the conformance checker to pinpoint exactly which transition violated the FSM.

Limitations & Future Work

Scope of FSM complexity: Experiments only covered a tiny 3‑state tutoring scenario; scaling to larger, branching protocols may expose parsing or memory limits.
Model‑specific tuning: The “optimal formality” is empirically derived per model; a universal method for predicting the Goldilocks zone is still missing.
External tool integration: FASTRIC currently relies on the LLM to act as its own runtime; integrating with external state machines or tool‑calling APIs could improve robustness for low‑capacity models.
User study: The paper does not evaluate how easily non‑expert designers can author FASTRIC specifications; future work should assess usability and learning curves.
Security considerations: Over‑specifying may unintentionally expose internal workflow logic; mechanisms for obfuscation or selective disclosure need exploration.

FASTRIC opens the door to treating LLM prompt design as a verifiable engineering discipline, giving developers the ability to specify, execute, and audit multi‑turn interactions with measurable guarantees.

Authors

Wen-Long Jin

Paper Information

arXiv ID: 2512.18940v1
Categories: cs.CL, cs.SE
Published: December 22, 2025
PDF: Download PDF

[Paper] FASTRIC: Prompt Specification Language for Verifiable LLM Interactions

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] A2P-Vis: an Analyzer-to-Presenter Agentic Pipeline for Visual Insights Generation and Reporting

[Paper] Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis

[Paper] Unifying Learning Dynamics and Generalization in Transformers Scaling Law

[Paper] Context as a Tool: Context Management for Long-Horizon SWE-Agents