[Paper] A Hybrid Reactive-Proactive Auto-scaling Algorithm for SLA-Constrained Edge Computing

Published: 1 month ago (December 16, 2025 at 06:01 AM EST)

3 min read

Source: arXiv

Source: arXiv - 2512.14290v1

Overview

Edge computing is reshaping how latency‑sensitive services—think IoT health monitors or smart‑farm sensors—are delivered. Gupta, Islam, and Buyya propose a hybrid reactive‑proactive auto‑scaling algorithm that keeps edge micro‑services within strict Service Level Agreements (SLAs) while minimizing costly over‑provisioning. Integrated directly into Kubernetes, the approach slashes SLA violations from ≈ 23 % (state‑of‑the‑art) to just ≈ 6 % in real‑world edge testbeds.

Key Contributions

Hybrid scaling logic: merges a machine‑learning (ML) predictor (proactive) with a traditional utilization‑based controller (reactive).
Kubernetes extension: packaged as a custom controller/Horizontal Pod Autoscaler (HPA) plug‑in, ready for production clusters.
SLA‑aware decision making: scaling actions are filtered through explicit latency, reliability, and availability thresholds.
Extensive empirical evaluation: runs on a realistic edge testbed (Raspberry Pi‑class nodes + cloud burst) using two open‑source micro‑service workloads (video analytics & IoT telemetry).
Quantitative improvement: reduces SLA violation rate by ~75 % and improves resource utilization by ~12 % compared with pure reactive or pure predictive baselines.

Methodology

Workload forecasting – A lightweight time‑series model (e.g., ARIMA‑enhanced LSTM) ingests recent request rates and predicts the next scaling interval (30 s).
Proactive scaling – The forecast translates into a target replica count, which is submitted to the Kubernetes API as a desired state.
Reactive guard‑rail – Simultaneously, a classic HPA monitors CPU/memory and SLA latency metrics. If actual utilization deviates sharply from the forecast (e.g., sudden traffic spike), the reactive component can instantly add or remove pods, overriding the proactive suggestion.
SLA filter – Both components respect a policy object that encodes maximum allowed response time, error rate, and availability. Scaling decisions that would still breach these limits are rejected, triggering a “burst‑to‑cloud” fallback.
Implementation – The hybrid controller runs as a side‑car in the Kubernetes control plane, communicating via the standard Custom Resource Definition (CRD) mechanism, so no core Kubernetes code changes are required.

Results & Findings

Metric	Pure Reactive (HPA)	Pure Proactive (ML)	Hybrid (Proactive + Reactive)
SLA violation rate	23 %	15 %	6 %
Average pod count (resource usage)	1.42 × baseline	1.35 × baseline	1.28 × baseline
Scaling latency (time to add pod)	45 s (cold start)	30 s (prediction lead)	32 s (prediction + corrective)
Cloud‑burst events	12	7	3

What it means: The hybrid algorithm anticipates demand early enough to keep latency under the SLA, yet it retains the safety net of a reactive controller to handle unexpected spikes. The net effect is fewer SLA breaches, lower cloud‑burst costs, and modestly tighter resource footprints.

Practical Implications

For DevOps teams: Drop the “one‑size‑fits‑all” HPA config and adopt the hybrid controller to meet strict latency SLAs without manual tuning of thresholds.
Cost savings: Fewer unnecessary cloud bursts translate directly into lower operational expenditure, especially for edge‑first deployments that pay per‑use for cloud overflow.
Simplified scaling policies: SLA constraints are expressed once in a declarative policy object, removing the need for separate alerting pipelines.
Portability: Because the solution lives as a Kubernetes extension, it works across any CNCF‑compatible distribution (EKS, GKE, K3s, etc.) and can be rolled out via Helm charts.
Edge‑centric CI/CD: Teams can integrate the predictor training step into their pipeline (e.g., retrain nightly on recent telemetry) to keep forecasts accurate as usage patterns evolve.

Limitations & Future Work

Model simplicity: The current predictor uses a relatively simple time‑series model; more complex workloads (e.g., multimodal IoT bursts) may benefit from deep‑learning ensembles.
Scaling granularity: The algorithm assumes pod‑level scaling; finer‑grained resource adjustments (e.g., CPU quotas) are not explored.
Edge heterogeneity: Experiments were run on homogeneous Raspberry Pi nodes; future studies should evaluate performance on heterogeneous edge hardware (GPU‑enabled, ARM vs. x86).
Security & multi‑tenant isolation: The paper does not address how the scaling controller behaves under malicious load spikes or in multi‑tenant edge clusters.

Bottom line: By marrying prediction with real‑time feedback, this hybrid auto‑scaler offers a pragmatic path for developers to keep edge services performant, cost‑effective, and SLA‑compliant—an essential step as the edge moves from experimental labs to production‑grade deployments.

Authors

Suhrid Gupta
Muhammed Tawfiqul Islam
Rajkumar Buyya

Paper Information

arXiv ID: 2512.14290v1
Categories: cs.DC
Published: December 16, 2025
PDF: Download PDF

[Paper] A Hybrid Reactive-Proactive Auto-scaling Algorithm for SLA-Constrained Edge Computing

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Asymptotic behaviour of galactic small-scale dynamos at modest magnetic Prandtl number

[Paper] Torrent: A Distributed DMA for Efficient and Flexible Point-to-Multipoint Data Movement

[Paper] The HEAL Data Platform

[Paper] Democratizing Scalable Cloud Applications: Transactional Stateful Functions on Streaming Dataflows