[Paper] 'Where is My Troubleshooting Procedure?': Studying the Potential of RAG in Assisting Failure Resolution of Large Cyber-Physical System

Published: 3 weeks ago (January 13, 2026 at 11:34 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2601.08706v1

Overview

The paper investigates how Retrieval‑Augmented Generation (RAG) can be turned into a conversational assistant that helps operators quickly locate the right troubleshooting procedure from massive, natural‑language manuals of large cyber‑physical systems (CPS). Using real‑world data from Fincantieri’s naval platforms, the authors show that a RAG‑based tool can cut down the time needed to find relevant steps, but it also highlights the need for safeguards before any recommendation is executed.

Key Contributions

Empirical study on RAG for CPS troubleshooting – first large‑scale evaluation on industrial naval manuals containing thousands of procedures.
Design of a hybrid retrieval‑generation pipeline that combines dense vector search with a fine‑tuned language model to produce concise, context‑aware answers.
User‑centric evaluation involving actual operators, measuring speed, accuracy, and perceived usefulness of the assistant.
Guidelines for safe deployment, including cross‑validation mechanisms and confidence‑threshold heuristics to avoid blind execution of generated steps.
Open dataset & benchmark (anonymized excerpts of the manuals) released for the research community to reproduce and extend the experiments.

Methodology

Data preparation – The authors extracted 3,412 troubleshooting procedures from Fincantieri’s documentation, cleaned the text, and segmented it into procedure‑level chunks.
Retrieval layer – A dense embedding model (based on SBERT) indexed the chunks, enabling fast similarity search given a symptom description.
Generation layer – A GPT‑style decoder was fine‑tuned on a subset of the manual to rewrite retrieved snippets into concise, step‑by‑step instructions tailored to the operator’s query.
Safety wrapper – Before presenting an answer, the system runs a rule‑based validator that checks for critical actions (e.g., power‑off, valve changes) against a whitelist and flags low‑confidence outputs.
Evaluation – Two experiments were conducted: (a) offline metrics (Recall@k, BLEU, factual consistency) and (b) online user study with 12 seasoned operators who solved simulated fault scenarios using either the RAG assistant or the traditional manual search.

Results & Findings

Metric	Traditional Search	RAG Assistant
Avg. time to first relevant step (seconds)	112 ± 23	38 ± 12
Correctness of selected procedure (% of cases)	71%	84%
Operator confidence (1‑5 Likert)	3.2	4.4
False‑positive recommendations (critical actions)	0% (manual)	2.3% (filtered)

Key Takeaways

The RAG tool reduced the “search‑and‑identify” phase by roughly 65 %, a huge win in time‑critical incidents.
Accuracy improved, but a small fraction of generated answers still suggested unsafe actions, underscoring the importance of the validation layer.
Operators reported that the conversational interface lowered cognitive load and made it easier to ask follow‑up “what if” questions.

Practical Implications

Faster incident response – Deploying a RAG‑powered assistant in control rooms can shave minutes off fault diagnosis, potentially preventing costly downtime in shipyards, power plants, or manufacturing lines.
Reduced training overhead – New engineers can rely on the assistant to navigate legacy documentation without memorizing every procedure.
Integration pathways – The architecture can be wrapped around existing CMMS/SCADA systems via APIs, enabling seamless hand‑off from chatbot to execution platforms.
Safety‑first deployment – The paper’s validation hooks (rule‑based checks, confidence thresholds) provide a blueprint for building “human‑in‑the‑loop” safeguards that satisfy regulatory standards.

Limitations & Future Work

Domain specificity – The study focuses on naval CPS; results may differ for other sectors with distinct vocabularies or procedural structures.
Limited multilingual support – Manuals were Italian‑centric; extending to multilingual corpora will require additional language models.
Scalability of validation – Rule‑based cross‑checks work for a known set of critical actions but may struggle with novel procedures; future work could explore automated formal verification or reinforcement‑learning‑based safety nets.
User study size – Only 12 operators participated; larger field trials are needed to confirm long‑term adoption and impact on real incidents.

Bottom line: RAG shows strong promise as a “smart search” layer for massive troubleshooting manuals, offering tangible speed and accuracy gains while reminding us that safety‑critical domains still demand rigorous validation before letting AI take the wheel.

Authors

Maria Teresa Rossi
Leonardo Mariani
Oliviero Riganelli
Giuseppe Filomento
Danilo Giannone
Paolo Gavazzo

Paper Information

arXiv ID: 2601.08706v1
Categories: cs.SE
Published: January 13, 2026
PDF: Download PDF

[Paper] 'Where is My Troubleshooting Procedure?': Studying the Potential of RAG in Assisting Failure Resolution of Large Cyber-Physical System

Overview

Key Contributions

Methodology

Results & Findings

Key Takeaways

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Applying Formal Methods Tools to an Electronic Warfare Codebase (Experience report)

[Paper] A Practical Guide to Establishing Technical Debt Management

[Paper] RITA: A Tool for Automated Requirements Classification and Specification from Online User Feedback

[Paper] Automation and Reuse Practices in GitHub Actions Workflows: A Practitioner's Perspective