[Paper] Req2Road: A GenAI Pipeline for SDV Test Artifact Generation and On-Vehicle Execution

Published: 3 days ago (February 17, 2026 at 09:03 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.15591v1

Overview

The paper “Req2Road: A GenAI Pipeline for SDV Test Artifact Generation and On‑Vehicle Execution” presents a prototype that turns natural‑language vehicle requirements into runnable test scripts for Software‑Defined Vehicles (SDVs). By leveraging large language models (LLMs) and vision‑language models (VLMs), the authors automate the creation of Gherkin‑style scenarios and map them to the Vehicle Signal Specification (VSS), enabling rapid, portable testing across simulators and real cars.

Key Contributions

End‑to‑end pipeline that converts heterogeneous requirement artifacts (text, tables, diagrams) into executable Gherkin scenarios and VSS‑linked test code.
Retrieval‑augmented generation (RAG) for pre‑selecting relevant VSS signals, improving the accuracy of signal‑to‑requirement mapping.
Integration of LLMs and VLMs to extract both textual and visual information from requirement documents.
Demonstration on a safety‑critical subsystem (Child Presence Detection System) in both virtual (simulation) and real‑vehicle (Vehicle‑in‑the‑Loop) environments.
Quantitative evaluation showing that 89 % of the examined requirements can be automatically transformed into executable tests.

Methodology

Requirement Ingestion – The pipeline ingests natural‑language requirements, accompanying tables, and design diagrams.
Signal Retrieval (RAG) – A vector store of VSS signal descriptions is queried to fetch the most relevant signals for each requirement.
LLM‑Driven Scenario Generation – A large language model (e.g., GPT‑4) receives the requirement text plus the retrieved signals and produces a Gherkin scenario (Given‑When‑Then format).
VLM‑Assisted Diagram Parsing – Vision‑language models analyze diagrams to extract additional signal names or state machines that the LLM might miss.
VSS Mapping & Code Synthesis – The identified signals are linked to VSS identifiers, and a code generator emits test scripts compatible with the target test bench (simulator or on‑vehicle test framework).
Execution & Feedback Loop – Generated tests are run in a virtual environment first; failures trigger a human‑in‑the‑loop review where missing or mis‑mapped signals are corrected, after which the tests are re‑executed on the actual vehicle.

Results & Findings

Coverage: 32 out of 36 (≈ 89 %) safety requirements for the Child Presence Detection System were successfully turned into executable Gherkin scenarios.
Gherkin Validity: Over 95 % of generated scenarios passed syntax validation tools without manual edits.
VSS Mapping Quality: The RAG step reduced incorrect signal assignments by ~40 % compared to a naïve LLM‑only approach.
End‑to‑End Executability: In both simulation and Vehicle‑in‑the‑Loop (ViL) runs, the generated tests executed without runtime errors, confirming the pipeline’s practical viability.
Human Intervention: Approximately 10 % of cases still required manual signal substitution or clarification of ambiguous requirement phrasing.

Practical Implications

Accelerated Test Development: Engineers can generate a baseline test suite directly from requirements, cutting weeks of manual test authoring.
Cross‑Toolchain Consistency: By anchoring to the VSS standard, the same test artifacts can be reused across different subsystems, simulators, and on‑vehicle test rigs, reducing duplication.
Safety‑Critical Assurance: Early, automated generation of tests for safety functions (e.g., child‑presence detection) helps meet automotive safety standards (ISO 26262) with less manual effort.
Scalable to New Features: As SDVs evolve, the pipeline can ingest updated requirement documents and quickly produce corresponding test cases, supporting continuous integration pipelines for automotive software.
Developer‑Friendly Artifacts: Gherkin scenarios are readable by both technical and non‑technical stakeholders, fostering better collaboration between software engineers, system architects, and safety analysts.

Limitations & Future Work

Ambiguity Handling: The current system still struggles with vague or poorly structured requirements, necessitating human review.
Domain‑Specific Knowledge: LLMs may miss subtle automotive nuances (e.g., timing constraints) that require domain‑specific fine‑tuning.
Scalability Tests: The evaluation focused on a single subsystem; broader studies across multiple SDV modules are needed to confirm generalizability.
Toolchain Integration: Future work aims to plug the pipeline directly into automotive CI/CD platforms (e.g., Jenkins, GitLab) and to support additional test frameworks beyond Gherkin.
Explainability: Providing traceability from each generated test back to the exact requirement fragment and signal source would improve auditability for safety certification.

Authors

Denesa Zyberaj
Lukasz Mazur
Pascal Hirmer
Nenad Petrovic
Marco Aiello
Alois Knoll

Paper Information

arXiv ID: 2602.15591v1
Categories: cs.SE
Published: February 17, 2026
PDF: Download PDF

[Paper] Req2Road: A GenAI Pipeline for SDV Test Artifact Generation and On-Vehicle Execution

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] huff: A Python package for Market Area Analysis

[Paper] What Makes a Good LLM Agent for Real-world Penetration Testing?

[Paper] Towards a Software Reference Architecture for Natural Language Processing Tools in Requirements Engineering

[Paper] The Runtime Dimension of Ethics in Self-Adaptive Systems