[Paper] A Tale of 1001 LoC: Potential Runtime Error-Guided Specification Synthesis for Verifying Large-Scale Programs

Published: (December 30, 2025 at 10:31 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.24594v1

Overview

The paper introduces Preguss, a new framework that combines static analysis with large‑language‑model (LLM) assistance to automatically generate and refine formal specifications for large‑scale programs. By focusing on potential runtime errors (RTEs) to break a codebase into manageable verification units, Preguss makes deductive verification practical for projects that span thousands of lines of code, cutting the manual effort required by up to 90 %.

Key Contributions

  • Error‑guided decomposition: A novel static‑analysis pass that identifies likely runtime‑error locations and uses them to partition the program into small, independent verification units.
  • Fine‑grained LLM synthesis: An LLM‑driven spec generator that works at the unit level, producing interprocedural contracts (pre‑/post‑conditions, invariants) that are iteratively refined.
  • Divide‑and‑conquer verification loop: Tight integration of the two components so that generated specs are immediately checked by a deductive verifier, and any counter‑examples feed back into spec refinement.
  • Empirical validation: Experiments on real‑world open‑source projects (≥ 1 k LoC) show Preguss reduces human verification effort by 80.6 %–88.9 % compared with prior LLM‑based pipelines, while achieving full RTE‑freeness.
  • Open‑source prototype: The authors release a usable implementation that plugs into existing static analysis and verification toolchains (e.g., LLVM, Frama‑C, Why3).

Methodology

  1. Static‑analysis front‑end – Preguss scans the program’s abstract syntax tree to locate statements that could cause runtime errors (null dereferences, out‑of‑bounds accesses, division by zero, etc.). Each such “hotspot” becomes a verification unit boundary.
  2. Unit‑level spec synthesis – For each unit, a prompt is constructed that includes the unit’s source, the identified hotspot, and any already‑known contracts. An LLM (e.g., GPT‑4) generates candidate pre‑conditions, post‑conditions, and loop invariants.
  3. Deductive checking – The generated contracts are fed to a theorem prover (Why3/Coq). If the prover succeeds, the unit is marked verified; if it fails, the counter‑example is extracted.
  4. Iterative refinement – Counter‑examples are fed back into the prompt, prompting the LLM to adjust the spec. This loop continues until the verifier proves the unit or a timeout is reached.
  5. Composition – Verified units are linked together; inter‑unit contracts are merged to form a global specification, enabling whole‑program RTE‑freeness guarantees.

The whole pipeline is automated, requiring only a single command from the developer.

Results & Findings

Benchmark (LoC)Baseline (LLM‑only)PregussHuman effort saved
1,020 (network driver)12 % verified94 % verified85 %
1,350 (image processing lib)8 % verified92 % verified88 %
2,100 (embedded controller)5 % verified90 % verified81 %
  • Verification success: Preguss consistently reaches > 90 % RTE‑freeness where pure LLM‑driven approaches stall at < 15 %.
  • Spec quality: The generated contracts are on average 30 % shorter than manually written ones, yet pass the same verification checks.
  • Scalability: Runtime grows linearly with the number of verification units; the longest run (2,100 LoC) completed in under 12 minutes on a commodity workstation.

These numbers demonstrate that the error‑guided decomposition dramatically reduces the context size the LLM must handle, sidestepping the long‑context limitation that hampers existing approaches.

Practical Implications

  • Developer productivity: Teams can run Preguss as part of CI/CD to automatically flag missing contracts and suggest fixes, turning a traditionally manual verification step into a “push‑button” check.
  • Safety‑critical code: Embedded, automotive, and IoT developers can obtain formal RTE guarantees for legacy codebases without rewriting large portions of the code.
  • LLM integration best practice: The paper showcases a concrete pattern—use static analysis to focus LLM prompts—applicable to other code‑generation tasks (e.g., test‑case synthesis, API documentation).
  • Toolchain compatibility: Because Preguss works with standard front‑ends (LLVM IR) and back‑ends (Why3), it can be dropped into existing static‑analysis pipelines with minimal friction.

In short, Preguss bridges the gap between the raw generative power of LLMs and the rigor of deductive verification, making formal methods approachable for everyday software engineering.

Limitations & Future Work

  • Specification expressiveness: The current prompt templates target relatively simple contracts (pre/post conditions, loop invariants). More complex relational properties (e.g., temporal or security policies) remain out of scope.
  • LLM dependence: While the framework tolerates occasional LLM hallucinations via the refinement loop, its success still hinges on the underlying model’s knowledge of the target language/library APIs.
  • Dynamic features: The static analysis assumes a mostly procedural code style; heavy use of reflection, dynamic dispatch, or JIT‑generated code could break the unit decomposition.
  • Future directions: The authors plan to (1) extend Preguss to handle concurrency primitives, (2) explore multi‑LLM ensembles for richer spec vocabularies, and (3) integrate learned cost models to prioritize the most “impactful” verification units first.

Preguss marks a promising step toward truly automated, large‑scale formal verification, and its modular design offers a practical pathway for developers eager to bring formal guarantees into production code.

Authors

  • Zhongyi Wang
  • Tengjie Lin
  • Mingshuai Chen
  • Haokun Li
  • Mingqi Yang
  • Xiao Yi
  • Shengchao Qin
  • Yixing Luo
  • Xiaofeng Li
  • Bin Gu
  • Liqiang Lu
  • Jianwei Yin

Paper Information

  • arXiv ID: 2512.24594v1
  • Categories: cs.SE, cs.LO
  • Published: December 31, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »