[Paper] KORAL: Knowledge Graph Guided LLM Reasoning for SSD Operational Analysis
Source: arXiv - 2602.10246v1
Overview
The paper introduces KORAL, a novel framework that couples Large Language Models (LLMs) with a domain‑specific Knowledge Graph (KG) to reason about Solid‑State Drive (SSD) health and performance. By turning fragmented telemetry and scattered literature into a unified, queryable graph, KORAL lets an LLM produce expert‑level diagnoses, predictions, and prescriptive actions without the massive data‑labeling effort that traditional methods demand.
Key Contributions
- Hybrid LLM + KG architecture that feeds structured knowledge into the language model, ensuring explanations stay grounded in SSD domain facts.
- Automated KG construction from raw telemetry (Data KG) and from existing technical documents (Literature KG), bridging the gap between unstructured logs and structured reasoning.
- End‑to‑end reasoning pipeline covering descriptive, predictive, prescriptive, and “what‑if” analyses for SSDs—first of its kind in storage systems research.
- Evidence‑backed outputs: every recommendation is accompanied by citations to KG nodes, making the reasoning traceable and auditable.
- Open‑source release of the SSD‑specific KG and code, enabling reproducibility and community extensions.
Methodology
- Telemetry Ingestion – Raw SSD metrics (temperature, wear‑level, I/O latency, etc.) are streamed from production servers.
- Data KG Generation – A lightweight extractor maps time‑stamped telemetry into entities (e.g., Device‑A, Temperature) and relationships (e.g., has‑value, observed‑during).
- Literature KG Integration – Papers, vendor manuals, and failure reports are parsed with NLP pipelines; key concepts (e.g., read‑disturb, thermal throttling) become nodes linked to causal edges.
- LLM Prompt Engineering – The LLM receives a contextual prompt that includes:
- A natural‑language query (e.g., “Why did latency spike on node X last night?”)
- Relevant sub‑graphs extracted from the combined KG (via graph‑based retrieval).
- Reasoning & Explanation – The LLM generates an answer while citing KG nodes, effectively “grounding” its output in factual data.
- Prescriptive Action Generation – For diagnosed issues, the system queries the KG for known mitigations (e.g., reduce write‑amplification), which the LLM reformulates as actionable steps.
The pipeline is modular: swapping the LLM (e.g., GPT‑4, LLaMA) or updating the KG does not require redesigning the whole system.
Results & Findings
- Accuracy – On a benchmark of 200 real‑world SSD incidents, KORAL’s diagnoses matched senior storage engineers 92 % of the time, outperforming a baseline statistical model (68 %).
- Explainability – 87 % of the generated reports included at least one KG citation, and operators rated the explanations “clear and trustworthy” in a user study (average Likert score 4.6/5).
- Speed – End‑to‑end query latency averaged 1.8 seconds, enabling near‑real‑time troubleshooting.
- Reduced Manual Effort – Operators reported a 45 % drop in time spent gathering logs and cross‑referencing documentation.
- What‑If Scenarios – Simulating temperature spikes showed KORAL could predict a 15 % increase in error‑rate within 2 hours, allowing proactive throttling actions.
Practical Implications
- Ops Teams can embed KORAL into monitoring dashboards to receive instant, evidence‑backed alerts instead of raw metric spikes.
- Capacity Planning tools can query the KG for long‑term wear trends, enabling more accurate SSD replacement schedules.
- Vendor Integration – Manufacturers can feed firmware release notes into the Literature KG, allowing the system to automatically suggest firmware upgrades when relevant symptoms appear.
- Developer APIs – The open‑source repo includes a REST interface; developers can programmatically ask “What mitigation reduces read‑disturb for this workload?” and receive a concise, cited answer.
- Cross‑Domain Extension – The same KG‑LLM pattern can be applied to other hardware components (e.g., HDDs, GPUs) or even to cloud service health diagnostics.
Limitations & Future Work
- KG Completeness – The quality of reasoning hinges on the breadth of the Literature KG; rare failure modes not captured in source documents may be missed.
- LLM Hallucination Risk – Although KG grounding reduces hallucinations, the LLM can still generate plausible‑but‑incorrect statements when the KG lacks a direct answer.
- Scalability of KG Updates – Continual ingestion of new telemetry and literature requires automated validation pipelines to avoid graph drift.
- Evaluation Scope – Experiments were performed on a single data‑center environment; broader validation across heterogeneous SSD models and workloads is needed.
- Future Directions – The authors plan to (1) integrate active learning loops where operator feedback refines KG edges, (2) explore retrieval‑augmented generation (RAG) models that natively query the KG, and (3) extend the framework to multi‑component system diagnostics (e.g., storage‑network‑compute co‑analysis).
Authors
- Mayur Akewar
- Sandeep Madireddy
- Dongsheng Luo
- Janki Bhimani
Paper Information
- arXiv ID: 2602.10246v1
- Categories: cs.DC, cs.AI
- Published: February 10, 2026
- PDF: Download PDF