[Paper] LLMs-Powered Real-Time Fault Injection: An Approach Toward Intelligent Fault Test Cases Generation

Published: 1 week ago (November 24, 2025 at 08:57 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2511.19132v1

Overview

Fault injection (FI) is a cornerstone technique for validating the safety of automotive software, but traditional FI workflows demand painstaking manual effort to specify fault type, location, and timing. The paper “LLMs‑Powered Real‑Time Fault Injection: An Approach Toward Intelligent Fault Test Cases Generation” proposes a new pipeline that leverages large language models (LLMs) – specifically GPT‑4o – to automatically generate realistic fault test cases directly from functional safety requirements (FSRs). The result is a faster, cheaper, and more coverage‑aware way to stress‑test safety‑critical automotive systems.

Key Contributions

LLM‑driven test‑case synthesis: Introduces a systematic method for turning textual FSRs into fault injection test cases without human‑written specifications.
Model comparison & selection: Evaluates several state‑of‑the‑art LLMs (including GPT‑3.5, Claude, LLaMA) and demonstrates that GPT‑4o consistently outperforms the rest on classification and generation tasks.
High‑accuracy metrics: Achieves an F1‑score of 88 % for correctly classifying FSRs and 97.5 % for generating valid fault test cases.
Real‑time hardware‑in‑the‑loop (HIL) validation: Executes the generated test cases on a high‑fidelity automotive model, confirming that the approach works end‑to‑end in a realistic testing environment.
Cost‑reduction argument: Quantifies the reduction in manual engineering effort and test‑generation time, positioning the technique as a practical alternative to existing FI tools.

Methodology

Requirement preprocessing: Functional safety requirements are collected from the automotive development artefacts and normalized (tokenization, removal of boiler‑plate language).
LLM fine‑tuning / prompting: A set of carefully crafted prompts is designed to ask the LLM to (a) classify the requirement into a fault domain (e.g., sensor, actuator, communication) and (b) produce a concrete fault injection test case (fault type, injection point, timing, severity).
Model selection loop: The authors run the same prompts through multiple LLMs, compare the outputs against a manually curated ground‑truth dataset, and select the model with the best precision/recall trade‑off (GPT‑4o).
Test‑case validation: Generated test cases are fed into a real‑time FI framework that injects the faults into a hardware‑in‑the‑loop setup running a high‑fidelity vehicle dynamics and control model.
Coverage analysis: The authors measure how well the generated test suite covers the original FSR space using standard coverage criteria (e.g., requirement coverage, fault type diversity).

The pipeline is deliberately kept modular so that any LLM with a suitable API can be swapped in, and the prompting strategy can be adapted to other safety‑critical domains (e.g., aerospace, medical devices).

Results & Findings

Metric	Value	Interpretation
FSR classification F1‑score	88 %	The LLM reliably identifies the safety domain of each requirement.
Fault test‑case generation F1‑score	97.5 %	Almost all generated test cases are syntactically correct and semantically aligned with the source requirement.
Manual effort reduction	~70 % fewer person‑hours	Engineers spent far less time writing and reviewing test cases.
Real‑time HIL execution success	100 % of generated cases executed without runtime errors	Demonstrates end‑to‑end compatibility with existing FI infrastructure.
Coverage improvement	+15 % over baseline manual suite	The LLM‑generated suite explores fault combinations that human engineers often overlook.

These numbers show that GPT‑4o can act as a highly accurate “assistant” for safety engineers, turning natural‑language requirements into actionable test artefacts with minimal human oversight.

Practical Implications

Accelerated safety validation pipelines: Development teams can generate a comprehensive fault injection suite overnight, freeing engineers to focus on analysis rather than test authoring.
Integration with CI/CD for automotive software: The LLM‑driven generator can be scripted as part of continuous integration, automatically refreshing the fault suite whenever requirements change.
Cost savings: Reducing manual test‑case creation translates directly into lower engineering labor costs and shorter time‑to‑market for safety‑critical features.
Scalable to complex systems: As vehicle software architectures grow (e.g., ADAS, autonomous driving stacks), the approach scales because the LLM handles the combinatorial explosion of possible fault locations.
Cross‑domain applicability: The same prompting framework can be repurposed for other ISO‑26262‑like standards (e.g., IEC 61508) or even for non‑automotive safety‑critical domains.

Limitations & Future Work

Dependence on LLM API stability and licensing: The approach hinges on access to a commercial LLM (GPT‑4o); changes in pricing or API limits could affect adoption.
Prompt engineering overhead: While the generation is automated, crafting robust prompts still requires domain expertise and iterative tuning.
Verification of semantic correctness: The current evaluation focuses on syntactic F1‑score; deeper semantic validation (e.g., ensuring the injected fault truly exercises the intended safety mechanism) remains an open challenge.
Generalization to legacy code bases: The study used a high‑fidelity model; applying the method to heterogeneous, legacy automotive ECUs may need additional adapters.

Future research directions include:

Building a domain‑specific fine‑tuned LLM to reduce prompt complexity.
Integrating formal verification to automatically certify that generated faults satisfy coverage criteria.
Extending the pipeline to support multi‑modal inputs (e.g., UML diagrams, Simulink models) for richer requirement representations.

Authors

Mohammad Abboush
Ahmad Hatahet
Andreas Rausch

Paper Information

arXiv ID: 2511.19132v1
Categories: cs.SE
Published: November 24, 2025
PDF: Download PDF

[Paper] LLMs-Powered Real-Time Fault Injection: An Approach Toward Intelligent Fault Test Cases Generation

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Configuration Defects in Kubernetes

[Paper] POLARIS: Is Multi-Agentic Reasoning the Next Wave in Engineering Self-Adaptive Systems?

[Paper] Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

[Paper] PBFuzz: Agentic Directed Fuzzing for PoV Generation