[Paper] SysPro: Reproducing System-level Concurrency Bugs from Bug Reports
Source: arXiv - 2601.09616v1
Overview
The paper introduces SysPro, a tool that can automatically turn a natural‑language bug report into a reproducible test case for system‑level concurrency bugs. By extracting the relevant system‑call sequence and the required input data, SysPro bridges the gap between vague, human‑written reports and the deterministic test harnesses developers need to debug and fix these hard‑to‑reproduce issues.
Key Contributions
- Automated extraction of system‑call names from unstructured bug reports using NLP techniques.
- Source‑code location mapping that links the extracted calls to the exact functions where they occur.
- Input‑generation pipeline that combines information retrieval, regex matching, and the category‑partition method to synthesize realistic inputs.
- Dynamic instrumentation framework that forces the identified system‑call interleaving during execution, making nondeterministic bugs deterministic.
- Empirical evaluation on a curated set of real‑world concurrency bugs showing high success rates (≈ 80 % bug reproduction) with modest overhead.
Methodology
- Report Parsing – SysPro parses the bug report text, identifies candidate system‑call names (e.g.,
open,read,ioctl) using a combination of keyword spotting and part‑of‑speech tagging. - Call‑Site Localization – The tool searches the project’s source tree to locate where each identified call is invoked, leveraging static analysis to resolve function overloads and indirect calls.
- Input Synthesis –
- Information Retrieval: Retrieves existing test inputs or configuration snippets that mention the same keywords.
- Regex Matching: Extracts concrete literals (file paths, flags, buffer sizes) from the report.
- Category‑Partition: Systematically enumerates parameter combinations to cover edge‑case values that often trigger concurrency bugs.
- Interleaving Enforcement – SysPro instruments the target binary at runtime (via
ptrace/LD_PRELOADhooks) to pause after each identified system call. A scheduler thread then forces the exact ordering reported (or inferred) in the bug description. - Reproduction Loop – The generated inputs and enforced interleaving are executed repeatedly until the bug manifests (e.g., a crash, assertion failure, or incorrect state).
All steps are automated, requiring only the bug report and the source repository as inputs.
Results & Findings
| Metric | Outcome |
|---|---|
| Bug reproduction rate | 78 % of 45 real‑world system‑level concurrency bugs reproduced successfully. |
| Localization accuracy | 92 % of extracted system‑call names correctly mapped to source locations. |
| Input generation time | Average 3.2 seconds per bug (including parsing, retrieval, and synthesis). |
| Runtime overhead | ≤ 12 % slowdown due to instrumentation, acceptable for debugging sessions. |
The study also revealed that most failures to reproduce were due to missing environment‑specific resources (e.g., device nodes) rather than shortcomings in the extraction or scheduling logic.
Practical Implications
- Faster triage – Developers can feed a newly filed bug report into SysPro and obtain a ready‑to‑run test case, cutting the “reproduce‑then‑debug” cycle from days to minutes.
- Continuous integration – SysPro can be integrated into CI pipelines to automatically generate regression tests for concurrency bugs as soon as they are reported.
- Improved bug tracking – By attaching a reproducible test case to each report, teams gain clearer visibility into bug severity and can prioritize fixes more objectively.
- Security testing – System‑level concurrency bugs often lead to privilege escalation or data leakage; SysPro enables security teams to reliably reproduce and assess the impact of such vulnerabilities.
Limitations & Future Work
- Environment dependencies – SysPro assumes the target system’s hardware and kernel configuration match those of the original bug; mismatches can prevent reproduction.
- Report quality variance – Extremely terse or ambiguous reports may not contain enough cues for accurate call‑site extraction.
- Scalability to large codebases – While the current prototype works well on medium‑sized projects, indexing and searching massive repositories could become a bottleneck.
Future directions include extending the approach to handle distributed systems (cross‑process interleavings), incorporating machine‑learning models to better infer missing input parameters, and building a cloud‑based service that abstracts away the environment setup for developers.
Authors
- Tarannum Shaila Zaman
- Zhihui Yan
- Chen Wang
- Chadni Islam
- Jiangfan Shi
- Tingting Yu
Paper Information
- arXiv ID: 2601.09616v1
- Categories: cs.SE
- Published: January 14, 2026
- PDF: Download PDF