[Paper] A Hybrid Reactive-Proactive Auto-scaling Algorithm for SLA-Constrained Edge Computing

Published: (December 16, 2025 at 06:01 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.14290v1

Overview

Edge computing is reshaping how latency‑sensitive services—think IoT health monitors or smart‑farm sensors—are delivered. Gupta, Islam, and Buyya propose a hybrid reactive‑proactive auto‑scaling algorithm that keeps edge micro‑services within strict Service Level Agreements (SLAs) while minimizing costly over‑provisioning. Integrated directly into Kubernetes, the approach slashes SLA violations from ≈ 23 % (state‑of‑the‑art) to just ≈ 6 % in real‑world edge testbeds.

Key Contributions

  • Hybrid scaling logic: merges a machine‑learning (ML) predictor (proactive) with a traditional utilization‑based controller (reactive).
  • Kubernetes extension: packaged as a custom controller/Horizontal Pod Autoscaler (HPA) plug‑in, ready for production clusters.
  • SLA‑aware decision making: scaling actions are filtered through explicit latency, reliability, and availability thresholds.
  • Extensive empirical evaluation: runs on a realistic edge testbed (Raspberry Pi‑class nodes + cloud burst) using two open‑source micro‑service workloads (video analytics & IoT telemetry).
  • Quantitative improvement: reduces SLA violation rate by ~75 % and improves resource utilization by ~12 % compared with pure reactive or pure predictive baselines.

Methodology

  1. Workload forecasting – A lightweight time‑series model (e.g., ARIMA‑enhanced LSTM) ingests recent request rates and predicts the next scaling interval (30 s).
  2. Proactive scaling – The forecast translates into a target replica count, which is submitted to the Kubernetes API as a desired state.
  3. Reactive guard‑rail – Simultaneously, a classic HPA monitors CPU/memory and SLA latency metrics. If actual utilization deviates sharply from the forecast (e.g., sudden traffic spike), the reactive component can instantly add or remove pods, overriding the proactive suggestion.
  4. SLA filter – Both components respect a policy object that encodes maximum allowed response time, error rate, and availability. Scaling decisions that would still breach these limits are rejected, triggering a “burst‑to‑cloud” fallback.
  5. Implementation – The hybrid controller runs as a side‑car in the Kubernetes control plane, communicating via the standard Custom Resource Definition (CRD) mechanism, so no core Kubernetes code changes are required.

Results & Findings

MetricPure Reactive (HPA)Pure Proactive (ML)Hybrid (Proactive + Reactive)
SLA violation rate23 %15 %6 %
Average pod count (resource usage)1.42 × baseline1.35 × baseline1.28 × baseline
Scaling latency (time to add pod)45 s (cold start)30 s (prediction lead)32 s (prediction + corrective)
Cloud‑burst events1273

What it means: The hybrid algorithm anticipates demand early enough to keep latency under the SLA, yet it retains the safety net of a reactive controller to handle unexpected spikes. The net effect is fewer SLA breaches, lower cloud‑burst costs, and modestly tighter resource footprints.

Practical Implications

  • For DevOps teams: Drop the “one‑size‑fits‑all” HPA config and adopt the hybrid controller to meet strict latency SLAs without manual tuning of thresholds.
  • Cost savings: Fewer unnecessary cloud bursts translate directly into lower operational expenditure, especially for edge‑first deployments that pay per‑use for cloud overflow.
  • Simplified scaling policies: SLA constraints are expressed once in a declarative policy object, removing the need for separate alerting pipelines.
  • Portability: Because the solution lives as a Kubernetes extension, it works across any CNCF‑compatible distribution (EKS, GKE, K3s, etc.) and can be rolled out via Helm charts.
  • Edge‑centric CI/CD: Teams can integrate the predictor training step into their pipeline (e.g., retrain nightly on recent telemetry) to keep forecasts accurate as usage patterns evolve.

Limitations & Future Work

  • Model simplicity: The current predictor uses a relatively simple time‑series model; more complex workloads (e.g., multimodal IoT bursts) may benefit from deep‑learning ensembles.
  • Scaling granularity: The algorithm assumes pod‑level scaling; finer‑grained resource adjustments (e.g., CPU quotas) are not explored.
  • Edge heterogeneity: Experiments were run on homogeneous Raspberry Pi nodes; future studies should evaluate performance on heterogeneous edge hardware (GPU‑enabled, ARM vs. x86).
  • Security & multi‑tenant isolation: The paper does not address how the scaling controller behaves under malicious load spikes or in multi‑tenant edge clusters.

Bottom line: By marrying prediction with real‑time feedback, this hybrid auto‑scaler offers a pragmatic path for developers to keep edge services performant, cost‑effective, and SLA‑compliant—an essential step as the edge moves from experimental labs to production‑grade deployments.

Authors

  • Suhrid Gupta
  • Muhammed Tawfiqul Islam
  • Rajkumar Buyya

Paper Information

  • arXiv ID: 2512.14290v1
  • Categories: cs.DC
  • Published: December 16, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »