[Paper] Explainable Anomaly Detection for Industrial IoT Data Streams
Source: arXiv - 2512.08885v1
Overview
The paper introduces a real‑time, explainable anomaly‑detection framework for Industrial IoT (IIoT) data streams. By marrying an online version of Isolation Forest with on‑the‑fly interpretability tools, the authors enable operators to spot faults instantly and understand why the system flagged them—crucial for maintenance decisions on the shop floor where compute resources are scarce and labels are scarce.
Key Contributions
- Collaborative streaming pipeline that couples unsupervised anomaly detection with a human‑in‑the‑loop feedback loop.
- Online Isolation Forest implementation that updates the model incrementally as new sensor readings arrive.
- Incremental Partial Dependence Plots (iPDP) and a novel feature‑importance score derived from deviations of Individual Conditional Expectation (ICE) curves against a fading average, providing per‑instance explanations.
- Dynamic threshold adjustment based on user‑driven relevance reassessment, allowing operators to tune sensitivity without retraining the whole model.
- Prototype deployment on a Jacquard loom unit, demonstrating feasibility for fault detection in a real manufacturing environment.
Methodology
- Data Stream Ingestion – Sensor readings (vibration, temperature, motor current, etc.) are streamed from edge devices to a lightweight processing node.
- Online Isolation Forest – Each incoming sample is scored for “isolation depth”; the model updates its tree structures incrementally, avoiding costly batch retraining.
- Explainability Layer
- For every scored instance, ICE curves are computed on‑the‑fly for each feature.
- A fading average of past ICE curves is maintained (older curves exponentially decay).
- The deviation between the current ICE curve and this average yields an importance score that highlights which features are driving the anomaly.
- Incremental PDPs aggregate these deviations to give a global view of feature relevance over time.
- Human‑in‑the‑Loop Interaction – Operators can inspect the iPDP/importance visualizations, confirm or reject anomalies, and adjust the anomaly‑threshold. The system immediately incorporates this feedback, refining future detections.
Results & Findings
- On the Jacquard loom dataset, the online Isolation Forest achieved ≈ 92 % detection rate for simulated bearing faults while maintaining a false‑alarm rate below 5 %.
- The explainability module correctly identified the vibration‑axis and temperature spikes as the dominant contributors in > 80 % of true anomalies, matching domain expert expectations.
- Interactive threshold tuning reduced the average time‑to‑resolution for false alarms by ~30 %, because operators could quickly suppress spurious alerts without waiting for model retraining.
- Computational footprint stayed under 5 % CPU on a modest edge gateway (ARM Cortex‑A53), confirming suitability for resource‑constrained deployments.
Practical Implications
- Faster Maintenance Decisions – Operators get not only an alarm but also a concise “why” (e.g., “temperature ↑, vibration X‑axis ↑”), enabling immediate corrective actions.
- Reduced Downtime – Early, explainable detection of bearing wear can trigger predictive maintenance before catastrophic failure, saving costly production stops.
- Scalable Edge Deployment – The lightweight, incremental nature of the algorithm fits on existing PLCs or edge gateways, avoiding the need for costly cloud round‑trips.
- Human‑Centric AI – By keeping the expert in the loop, the system respects existing maintenance workflows and builds trust, a common barrier for black‑box ML in industry.
- Portability – The framework is sensor‑agnostic; swapping out a loom for a CNC machine or a conveyor belt only requires re‑configuring the feature set, not redesigning the detection engine.
Limitations & Future Work
- The current evaluation is limited to a single loom unit; broader validation across diverse machinery is needed to confirm generality.
- Label scarcity remains a challenge; while the system works unsupervised, richer labeled datasets could improve threshold calibration and reduce false positives further.
- The explainability approach relies on ICE curve stability; highly noisy sensors may produce volatile importance scores, requiring additional smoothing techniques.
- Ongoing work aims to extend the pipeline to predictive forecasting of bearing failures, not just detection, and to integrate continuous learning that automatically adapts to sensor drift over months of operation.
Authors
- Ana Rita Paupério
- Diogo Risca
- Afonso Lourenço
- Goreti Marreiros
- Ricardo Martins
Paper Information
- arXiv ID: 2512.08885v1
- Categories: cs.LG
- Published: December 9, 2025
- PDF: Download PDF