[Paper] A Bayesian Optimization-Based AutoML Framework for Non-Intrusive Load Monitoring

Published: (February 5, 2026 at 10:05 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.05739v1

Overview

Non‑Intrusive Load Monitoring (NILM) tries to “listen” to a home’s total electricity draw and infer the usage of each individual appliance. The new paper proposes an AutoML framework powered by Bayesian Optimization that automatically picks the right NILM model and tunes its hyper‑parameters—so energy analysts can get high‑quality disaggregation without being machine‑learning experts. The authors also release AutoML4NILM, an open‑source toolkit that makes the whole pipeline plug‑and‑play.

Key Contributions

  • First AutoML pipeline for NILM – integrates Bayesian Optimization to automate model selection and hyper‑parameter search across a diverse set of algorithms.
  • Open‑source toolkit (AutoML4NILM) – ships with 11 ready‑to‑use learning algorithms, a unified API, and extensible configuration files for adding new models or parameters.
  • Domain‑agnostic workflow – abstracts away data‑science intricacies, letting practitioners focus on data collection and business logic.
  • Empirical validation – benchmarked on public NILM datasets (e.g., REDD, UK‑DALE) showing comparable or better disaggregation accuracy than hand‑tuned baselines while reducing development time.
  • Reproducibility package – all experiments, hyper‑parameter search spaces, and evaluation scripts are publicly available, encouraging community extensions.

Methodology

  1. Data Ingestion – Raw aggregate power signals are pre‑processed (resampling, noise filtering) and optionally segmented into appliance‑specific windows.
  2. Algorithm Pool – The framework bundles 11 supervised learning models (e.g., Random Forest, Gradient Boosting, CNN‑based sequence models) each described by a searchable hyper‑parameter space.
  3. Bayesian Optimization Loop
    • A surrogate model (Gaussian Process) predicts the performance of unseen hyper‑parameter configurations.
    • An acquisition function (e.g., Expected Improvement) proposes the next promising configuration to evaluate.
    • The selected model is trained on a validation split of the NILM data, and its disaggregation error (typically MAE or F‑score) is fed back to the optimizer.
  4. Model Selection & Deployment – After a budgeted number of iterations, the best‑performing model‑hyper‑parameter pair is exported, ready for inference on new aggregate traces.

The whole pipeline is orchestrated through a simple YAML/JSON config, making it scriptable from CI pipelines or edge devices.

Results & Findings

DatasetBaseline (hand‑tuned)AutoML (Bayesian Opt.)Relative Gain
REDDMAE = 0.12 kWMAE = 0.09 kW ~25 % improvement
UK‑DALEF‑score = 0.78F‑score = 0.81 ~4 % boost
ECOMAE = 0.15 kWMAE = 0.13 kW ~13 % improvement

Key take‑aways

  • Automation pays off – The Bayesian search consistently found hyper‑parameter settings that outperformed the authors’ manually tuned baselines, often with fewer training cycles.
  • Algorithm diversity matters – No single model dominated; the optimizer selected different algorithms for different appliances (e.g., tree‑based models for resistive loads, CNNs for cyclical devices).
  • Speed vs. Accuracy trade‑off – By limiting the optimization budget (e.g., 50 trials), developers can obtain a “good enough” model in under an hour on a modest GPU/CPU, suitable for rapid prototyping.

Practical Implications

  • Rapid prototyping for smart‑home startups – Engineers can spin up a NILM service in days rather than weeks, focusing on UI/UX and integration instead of model fiddling.
  • Utility‑scale analytics – Grid operators can run the AutoML pipeline on aggregated feeder data to generate appliance‑level load forecasts, enabling demand‑response programs without installing per‑device meters.
  • Edge deployment – Because the final model is exported as a lightweight artifact (e.g., ONNX), it can be embedded in home gateways or IoT hubs for real‑time disaggregation.
  • Research acceleration – New NILM algorithms can be dropped into the toolkit and automatically benchmarked against the existing pool, fostering reproducible comparisons.

Overall, the framework lowers the barrier to entry for any organization that wants to turn raw electricity data into actionable insights.

Limitations & Future Work

  • Algorithm coverage – The current 11 models, while diverse, omit recent transformer‑based sequence models that have shown promise in NILM.
  • Computational cost – Bayesian Optimization, especially with Gaussian Processes, scales poorly beyond a few hundred trials; larger search spaces may need alternative surrogates (e.g., Tree‑Parzen Estimators).
  • Label dependence – The framework still requires ground‑truth appliance labels for training; unsupervised or weakly supervised extensions are an open challenge.
  • Real‑time constraints – The paper focuses on offline accuracy; future work will evaluate latency and power consumption of the selected models on edge hardware.

Planned extensions include adding a plug‑in for neural architecture search, supporting transfer learning across homes, and building a cloud‑native orchestration layer for continuous model retraining as new data arrives.

Authors

  • Nazanin Siavash
  • Armin Moin

Paper Information

  • arXiv ID: 2602.05739v1
  • Categories: cs.SE
  • Published: February 5, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »