[Paper] A Hybrid Reactive-Proactive Auto-scaling Algorithm for SLA-Constrained Edge Computing
Source: arXiv - 2512.14290v1
Overview
Edge computing is reshaping how latency‑sensitive services—think IoT health monitors or smart‑farm sensors—are delivered. Gupta, Islam, and Buyya propose a hybrid reactive‑proactive auto‑scaling algorithm that keeps edge micro‑services within strict Service Level Agreements (SLAs) while minimizing costly over‑provisioning. Integrated directly into Kubernetes, the approach slashes SLA violations from ≈ 23 % (state‑of‑the‑art) to just ≈ 6 % in real‑world edge testbeds.
Key Contributions
- Hybrid scaling logic: merges a machine‑learning (ML) predictor (proactive) with a traditional utilization‑based controller (reactive).
- Kubernetes extension: packaged as a custom controller/Horizontal Pod Autoscaler (HPA) plug‑in, ready for production clusters.
- SLA‑aware decision making: scaling actions are filtered through explicit latency, reliability, and availability thresholds.
- Extensive empirical evaluation: runs on a realistic edge testbed (Raspberry Pi‑class nodes + cloud burst) using two open‑source micro‑service workloads (video analytics & IoT telemetry).
- Quantitative improvement: reduces SLA violation rate by ~75 % and improves resource utilization by ~12 % compared with pure reactive or pure predictive baselines.
Methodology
- Workload forecasting – A lightweight time‑series model (e.g., ARIMA‑enhanced LSTM) ingests recent request rates and predicts the next scaling interval (30 s).
- Proactive scaling – The forecast translates into a target replica count, which is submitted to the Kubernetes API as a desired state.
- Reactive guard‑rail – Simultaneously, a classic HPA monitors CPU/memory and SLA latency metrics. If actual utilization deviates sharply from the forecast (e.g., sudden traffic spike), the reactive component can instantly add or remove pods, overriding the proactive suggestion.
- SLA filter – Both components respect a policy object that encodes maximum allowed response time, error rate, and availability. Scaling decisions that would still breach these limits are rejected, triggering a “burst‑to‑cloud” fallback.
- Implementation – The hybrid controller runs as a side‑car in the Kubernetes control plane, communicating via the standard Custom Resource Definition (CRD) mechanism, so no core Kubernetes code changes are required.
Results & Findings
| Metric | Pure Reactive (HPA) | Pure Proactive (ML) | Hybrid (Proactive + Reactive) |
|---|---|---|---|
| SLA violation rate | 23 % | 15 % | 6 % |
| Average pod count (resource usage) | 1.42 × baseline | 1.35 × baseline | 1.28 × baseline |
| Scaling latency (time to add pod) | 45 s (cold start) | 30 s (prediction lead) | 32 s (prediction + corrective) |
| Cloud‑burst events | 12 | 7 | 3 |
What it means: The hybrid algorithm anticipates demand early enough to keep latency under the SLA, yet it retains the safety net of a reactive controller to handle unexpected spikes. The net effect is fewer SLA breaches, lower cloud‑burst costs, and modestly tighter resource footprints.
Practical Implications
- For DevOps teams: Drop the “one‑size‑fits‑all” HPA config and adopt the hybrid controller to meet strict latency SLAs without manual tuning of thresholds.
- Cost savings: Fewer unnecessary cloud bursts translate directly into lower operational expenditure, especially for edge‑first deployments that pay per‑use for cloud overflow.
- Simplified scaling policies: SLA constraints are expressed once in a declarative policy object, removing the need for separate alerting pipelines.
- Portability: Because the solution lives as a Kubernetes extension, it works across any CNCF‑compatible distribution (EKS, GKE, K3s, etc.) and can be rolled out via Helm charts.
- Edge‑centric CI/CD: Teams can integrate the predictor training step into their pipeline (e.g., retrain nightly on recent telemetry) to keep forecasts accurate as usage patterns evolve.
Limitations & Future Work
- Model simplicity: The current predictor uses a relatively simple time‑series model; more complex workloads (e.g., multimodal IoT bursts) may benefit from deep‑learning ensembles.
- Scaling granularity: The algorithm assumes pod‑level scaling; finer‑grained resource adjustments (e.g., CPU quotas) are not explored.
- Edge heterogeneity: Experiments were run on homogeneous Raspberry Pi nodes; future studies should evaluate performance on heterogeneous edge hardware (GPU‑enabled, ARM vs. x86).
- Security & multi‑tenant isolation: The paper does not address how the scaling controller behaves under malicious load spikes or in multi‑tenant edge clusters.
Bottom line: By marrying prediction with real‑time feedback, this hybrid auto‑scaler offers a pragmatic path for developers to keep edge services performant, cost‑effective, and SLA‑compliant—an essential step as the edge moves from experimental labs to production‑grade deployments.
Authors
- Suhrid Gupta
- Muhammed Tawfiqul Islam
- Rajkumar Buyya
Paper Information
- arXiv ID: 2512.14290v1
- Categories: cs.DC
- Published: December 16, 2025
- PDF: Download PDF