[Paper] SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

Published: (February 18, 2026 at 01:09 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.16671v1

Overview

The paper presents SPARC, a novel framework that combines large‑language‑model (LLM) generation with classic program analysis to automatically create high‑quality unit tests for C code. By grounding the model’s reasoning in the program’s control‑flow graph and a curated set of helper operations, SPARC dramatically reduces the “leap‑to‑code” failures that typically plague pure LLM‑based test synthesis.

Key Contributions

  • Neuro‑symbolic pipeline that couples LLM‑driven synthesis with static analysis (CFG) and a validated Operation Map of reusable helper functions.
  • Scenario‑based test generation: each test targets a specific execution path, ensuring systematic coverage of branches and edge cases.
  • Iterative self‑correction loop that feeds compiler errors and runtime feedback back to the LLM, automatically repairing non‑compilable or flaky tests.
  • Empirical evaluation on 59 real‑world and algorithmic C subjects, showing up to 31 % higher line coverage and 26 % higher branch coverage than a vanilla prompt baseline, and competitive mutation scores against the symbolic executor KLEE.
  • Human‑centered metrics: SPARC’s tests receive higher readability and maintainability scores from developers, and 94.3 % of generated tests survive the repair loop without manual intervention.

Methodology

  1. Control‑Flow Graph (CFG) Extraction – The target C function is parsed to build a CFG, exposing all possible execution paths.
  2. Operation Map Construction – A curated library of “utility helpers” (e.g., safe pointer wrappers, memory‑allocation patterns) is created. Each helper is type‑checked and unit‑tested beforehand, guaranteeing that the LLM can only invoke proven building blocks.
  3. Path‑Targeted Prompting – For each CFG path, a prompt is assembled that (a) describes the path’s conditions, (b) lists the relevant helpers from the Operation Map, and (c) asks the LLM to emit a test case that drives execution along that path.
  4. Iterative Validation & Repair – The generated test is compiled; any syntax or type errors are captured and fed back to the LLM as corrective hints. The test is then executed; runtime failures (e.g., segmentation faults, assertion violations) trigger another round of LLM‑guided repair. This loop repeats until the test compiles, runs cleanly, and satisfies the path’s constraints.

The overall design keeps the LLM’s creativity bounded by concrete, verifiable program artifacts, turning “free‑form” generation into a disciplined reasoning process.

Results & Findings

MetricVanilla LLM PromptSPARCKLEE (symbolic exec.)
Line coverage ↑+31.36 %≈ comparable
Branch coverage ↑+26.01 %≈ comparable
Mutation score ↑+20.78 %similar on simple subjects
Test survivability after repair94.3 %N/A
Developer readability (1‑5)2.84.1N/A

Key takeaways

  • The scenario‑driven approach forces the LLM to consider each branch explicitly, which translates directly into higher coverage.
  • The self‑correction loop eliminates the majority of compilation‑time failures that usually render LLM‑generated tests unusable.
  • On complex, pointer‑heavy codebases, SPARC matches or exceeds KLEE’s coverage while producing human‑readable tests—something symbolic execution tools often struggle with.

Practical Implications

  • Legacy C codebases: Companies maintaining decades‑old C systems can bootstrap a regression suite without hand‑crafting tests for every module.
  • Continuous Integration (CI): SPARC can be integrated as a nightly job that expands test coverage automatically as new functions are added.
  • Developer productivity: By delivering readable, maintainable tests, SPARC reduces the overhead of reviewing auto‑generated code, letting engineers focus on higher‑level design work.
  • Security testing: Higher branch and mutation coverage translates into better detection of edge‑case bugs, including memory‑safety issues that are common in low‑level code.
  • Tooling ecosystem: The Operation Map concept can be shared across projects, creating a reusable “LLM‑friendly” API surface for C developers.

Limitations & Future Work

  • Dependency on accurate CFG: Complex macro usage or conditional compilation can confuse the static analysis phase, limiting path extraction.
  • Operation Map maintenance: The helper library must be kept in sync with project‑specific coding conventions and custom allocators, which adds a modest upkeep cost.
  • Scalability to massive codebases: While the authors tested 59 subjects, scaling the path enumeration and iterative repair to millions of lines may require smarter path prioritization heuristics.
  • Generalization beyond C: The current design leverages C‑specific constructs (pointers, manual memory). Extending SPARC to C++ or Rust will need additional language‑specific operation maps and safety checks.

Future research directions include automated discovery of new helpers from existing test suites, adaptive path selection based on coverage feedback, and tighter integration with CI pipelines to close the loop between test generation, execution, and defect triage.

Authors

  • Jaid Monwar Chowdhury
  • Chi-An Fu
  • Reyhaneh Jabbarvand

Paper Information

  • arXiv ID: 2602.16671v1
  • Categories: cs.SE, cs.AI
  • Published: February 18, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »