[Paper] Smart IoT-Based Leak Forecasting and Detection for Energy-Efficient Liquid Cooling in AI Data Centers
Source: arXiv - 2512.21801v1
Overview
The paper proposes a smart IoT‑based monitoring platform that uses machine‑learning to forecast and detect coolant leaks in AI‑focused data centers that rely on liquid cooling. By combining an LSTM model for probabilistic leak prediction with a Random‑Forest detector for real‑time alerts, the authors demonstrate a prototype that can cut down unplanned shutdowns and the associated energy waste.
Key Contributions
- Hybrid ML pipeline – LSTM network for 2‑4 hour leak forecasting + Random‑Forest classifier for sub‑minute leak detection.
- IoT‑centric architecture – MQTT for low‑latency sensor streaming, InfluxDB for time‑series storage, and Streamlit dashboards for operator visibility.
- Feature analysis – Empirical evidence that humidity, pressure, and flow‑rate are strong early‑warning signals, while temperature lags due to hardware thermal inertia.
- Energy‑impact estimate – Simulation on a 47‑rack facility predicts ~1,500 kWh/year saved by avoiding emergency shutdowns.
- Synthetic validation – Dataset generated according to ASHRAE 2021 cooling standards, enabling reproducible benchmarking of leak‑prediction models.
Methodology
- Data Generation – Synthetic sensor streams (temperature, humidity, pressure, flow) were created to mimic real‑world coolant loops, following ASHRAE 2021 guidelines for liquid‑cooled racks.
- Pre‑processing – Sensor readings were resampled to 1‑second intervals, normalized, and labeled with leak‑event windows (±30 min for forecasting, instantaneous for detection).
- Forecasting Model – A stacked LSTM (2 layers, 64 hidden units) ingests the past 10 minutes of multivariate data and outputs a probability distribution over future leak occurrence within the next 2‑4 hours.
- Detection Model – A Random‑Forest (100 trees) consumes the same feature window but is trained to classify “leak now” vs. “normal” for immediate alerts.
- Deployment Stack – Sensors publish JSON payloads via MQTT → InfluxDB time‑series DB → Python services run the LSTM/Random‑Forest inference → Streamlit UI visualizes forecasts and alerts.
- Evaluation – Accuracy, precision, recall, and a custom “probability‑within‑window” metric were computed on a held‑out synthetic test set.
Results & Findings
| Metric | Forecast (LSTM) | Detection (RF) |
|---|---|---|
| Accuracy | 87 % (±30 min window) | 96.5 % |
| Precision | 0.84 | 0.97 |
| Recall | 0.81 | 0.96 |
| Avg. lead‑time (forecast) | 2–4 h | — |
| Avg. detection latency | — | < 1 min |
- Humidity, pressure, and flow‑rate consistently rose 30‑90 min before a leak, providing the strongest predictive cues.
- Temperature showed negligible early variation, confirming that thermal inertia masks leak signatures.
- The end‑to‑end pipeline processed ~10 k samples/s on a modest CPU (Intel i7), demonstrating feasibility for on‑prem deployment without GPU acceleration.
- Energy‑saving calculations (based on ASHRAE‑defined cooling power) suggest that proactive leak avoidance could reduce annual electricity consumption by roughly 1.5 MWh for a mid‑size AI data center.
Practical Implications
- Reduced downtime – Operators can schedule preventive maintenance before a leak escalates, avoiding emergency shutdowns that disrupt AI workloads.
- Energy efficiency – Early leak mitigation cuts the extra cooling load and the power draw of backup fans/compressors that kick in during a fault.
- Scalable IoT stack – The use of MQTT and InfluxDB aligns with existing data‑center monitoring ecosystems, making integration straightforward for DevOps teams.
- Model portability – Both LSTM and Random‑Forest models are lightweight enough to run on edge gateways (e.g., Raspberry Pi, industrial PCs), enabling distributed inference close to the coolant loops.
- Compliance & reporting – Real‑time dashboards provide audit trails for sustainability certifications (e.g., LEED, ENERGY STAR) and can be tied into existing CMMS (Computerized Maintenance Management Systems).
Limitations & Future Work
- Synthetic data only – The models have not yet been validated on real‑world sensor logs; domain shift could affect accuracy.
- Feature set limited to four sensors – Additional variables (e.g., vibration, acoustic signatures) might improve early detection.
- Single‑facility scope – Energy‑saving estimates are based on a 47‑rack layout; larger or differently configured centers may see different gains.
- Model drift handling – The paper does not address continuous learning or adaptation as coolant chemistry or hardware ages.
Future research directions include deploying the pipeline in a live testbed, expanding the sensor suite, and exploring online learning techniques to keep the models calibrated over time.
Authors
- Krishna Chaitanya Sunkara
- Rambabu Konakanchi
Paper Information
- arXiv ID: 2512.21801v1
- Categories: cs.LG, cs.DC, cs.NI, eess.SY
- Published: December 25, 2025
- PDF: Download PDF