[Paper] SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

Published: 2 months ago (February 18, 2026 at 01:09 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.16671v1

Overview

The paper presents SPARC, a novel framework that combines large‑language‑model (LLM) generation with classic program analysis to automatically create high‑quality unit tests for C code. By grounding the model’s reasoning in the program’s control‑flow graph and a curated set of helper operations, SPARC dramatically reduces the “leap‑to‑code” failures that typically plague pure LLM‑based test synthesis.

Key Contributions

Neuro‑symbolic pipeline that couples LLM‑driven synthesis with static analysis (CFG) and a validated Operation Map of reusable helper functions.
Scenario‑based test generation: each test targets a specific execution path, ensuring systematic coverage of branches and edge cases.
Iterative self‑correction loop that feeds compiler errors and runtime feedback back to the LLM, automatically repairing non‑compilable or flaky tests.
Empirical evaluation on 59 real‑world and algorithmic C subjects, showing up to 31 % higher line coverage and 26 % higher branch coverage than a vanilla prompt baseline, and competitive mutation scores against the symbolic executor KLEE.
Human‑centered metrics: SPARC’s tests receive higher readability and maintainability scores from developers, and 94.3 % of generated tests survive the repair loop without manual intervention.

Methodology

Control‑Flow Graph (CFG) Extraction – The target C function is parsed to build a CFG, exposing all possible execution paths.
Operation Map Construction – A curated library of “utility helpers” (e.g., safe pointer wrappers, memory‑allocation patterns) is created. Each helper is type‑checked and unit‑tested beforehand, guaranteeing that the LLM can only invoke proven building blocks.
Path‑Targeted Prompting – For each CFG path, a prompt is assembled that (a) describes the path’s conditions, (b) lists the relevant helpers from the Operation Map, and (c) asks the LLM to emit a test case that drives execution along that path.
Iterative Validation & Repair – The generated test is compiled; any syntax or type errors are captured and fed back to the LLM as corrective hints. The test is then executed; runtime failures (e.g., segmentation faults, assertion violations) trigger another round of LLM‑guided repair. This loop repeats until the test compiles, runs cleanly, and satisfies the path’s constraints.

The overall design keeps the LLM’s creativity bounded by concrete, verifiable program artifacts, turning “free‑form” generation into a disciplined reasoning process.

Results & Findings

Metric	Vanilla LLM Prompt	SPARC	KLEE (symbolic exec.)
Line coverage ↑	–	+31.36 %	≈ comparable
Branch coverage ↑	–	+26.01 %	≈ comparable
Mutation score ↑	–	+20.78 %	similar on simple subjects
Test survivability after repair	–	94.3 %	N/A
Developer readability (1‑5)	2.8	4.1	N/A

Key takeaways

The scenario‑driven approach forces the LLM to consider each branch explicitly, which translates directly into higher coverage.
The self‑correction loop eliminates the majority of compilation‑time failures that usually render LLM‑generated tests unusable.
On complex, pointer‑heavy codebases, SPARC matches or exceeds KLEE’s coverage while producing human‑readable tests—something symbolic execution tools often struggle with.

Practical Implications

Legacy C codebases: Companies maintaining decades‑old C systems can bootstrap a regression suite without hand‑crafting tests for every module.
Continuous Integration (CI): SPARC can be integrated as a nightly job that expands test coverage automatically as new functions are added.
Developer productivity: By delivering readable, maintainable tests, SPARC reduces the overhead of reviewing auto‑generated code, letting engineers focus on higher‑level design work.
Security testing: Higher branch and mutation coverage translates into better detection of edge‑case bugs, including memory‑safety issues that are common in low‑level code.
Tooling ecosystem: The Operation Map concept can be shared across projects, creating a reusable “LLM‑friendly” API surface for C developers.

Limitations & Future Work

Dependency on accurate CFG: Complex macro usage or conditional compilation can confuse the static analysis phase, limiting path extraction.
Operation Map maintenance: The helper library must be kept in sync with project‑specific coding conventions and custom allocators, which adds a modest upkeep cost.
Scalability to massive codebases: While the authors tested 59 subjects, scaling the path enumeration and iterative repair to millions of lines may require smarter path prioritization heuristics.
Generalization beyond C: The current design leverages C‑specific constructs (pointers, manual memory). Extending SPARC to C++ or Rust will need additional language‑specific operation maps and safety checks.

Future research directions include automated discovery of new helpers from existing test suites, adaptive path selection based on coverage feedback, and tighter integration with CI pipelines to close the loop between test generation, execution, and defect triage.

Authors

Jaid Monwar Chowdhury
Chi-An Fu
Reyhaneh Jabbarvand

Paper Information

arXiv ID: 2602.16671v1
Categories: cs.SE, cs.AI
Published: February 18, 2026
PDF: Download PDF

[Paper] SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] The Geometry of Noise: Why Diffusion Models Don't Need Noise Conditioning

[Paper] Subgroups of $U(d)$ Induce Natural RNN and Transformer Architectures

[Paper] Unifying approach to uniform expressivity of graph neural networks

[Paper] Latent Equivariant Operators for Robust Object Recognition: Promise and Challenges