[Paper] QoS-Aware Load Balancing in the Computing Continuum via Multi-Player Bandits

Published: 1 week ago (December 21, 2025 at 06:18 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.18915v1

Overview

The paper introduces QEdgeProxy, a decentralized load‑balancing layer designed for the emerging Computing Continuum—the seamless blend of cloud, edge, and device‑level compute. By treating each load‑balancer as a player in a multi‑armed bandit game, QEdgeProxy can dynamically steer IoT traffic to the service instance most likely to meet each client’s latency and reliability targets, even as workloads and network conditions shift.

Key Contributions

QoS‑centric formulation: Casts load balancing in the continuum as a Multi‑Player Multi‑Armed Bandit (MP‑MAB) problem with heterogeneous rewards that directly model per‑client QoS success probabilities.
Kernel Density Estimation (KDE) for reward modeling: Uses KDE to estimate the distribution of observed response times, enabling a smooth probability estimate of meeting a client’s QoS deadline.
Adaptive exploration strategy: Introduces a lightweight, context‑aware exploration mechanism that quickly reacts to non‑stationary conditions (e.g., sudden load spikes or instance failures).
Kubernetes‑native implementation: Provides an open‑source QEdgeProxy that runs as a sidecar/proxy in K3s clusters, requiring no changes to existing services.
Empirical validation: Demonstrates on a realistic edge‑AI workload that QEdgeProxy outperforms simple proximity‑based routing and a state‑of‑the‑art reinforcement‑learning balancer in per‑client QoS satisfaction and adaptability.

Methodology

Problem Modeling – Each edge proxy (player) must pick one of several service instances (arms) for each incoming request. The “reward” is binary: 1 if the request meets the client’s QoS deadline, 0 otherwise. Because different clients have different latency targets, rewards are heterogeneous across players.
Reward Estimation with KDE – Instead of counting successes, QEdgeProxy builds a kernel‑density estimate of the observed response‑time distribution for each arm. The area under the curve that lies below the client’s deadline yields the estimated success probability.
Decision Rule – Players use an Upper‑Confidence Bound (UCB) style rule that balances the estimated success probability with an exploration bonus. The bonus shrinks when the KDE becomes confident and expands when recent observations indicate a shift.
Adaptation to Non‑Stationarity – A sliding window discards stale samples, and the exploration bonus is inflated after detecting a significant change in the estimated distribution (e.g., via a KL‑divergence test).
Implementation – QEdgeProxy is packaged as a lightweight Go service that intercepts HTTP/gRPC traffic, queries the local KDE tables, and forwards the request to the chosen instance. It integrates with Kubernetes via a Custom Resource Definition (CRD) that declares QoS targets per client.

Results & Findings

Baseline	Avg. per‑client QoS satisfaction	Adaptation latency (to load surge)
Proximity‑based routing	71 %	45 s
RL‑based balancer (DQN)	78 %	30 s
QEdgeProxy	92 %	12 s

Higher QoS satisfaction: QEdgeProxy consistently kept latency‑sensitive requests under their deadlines, delivering a ~15 % improvement over the RL baseline.
Fast recovery: When a service instance failed or a sudden traffic burst hit the edge node, QEdgeProxy re‑allocated traffic within seconds, whereas the RL model required many more episodes to relearn.
Low overhead: The proxy added < 2 ms of processing latency per request, negligible compared to typical edge‑AI inference times (≈ 30 ms).

Practical Implications

Edge‑AI deployments: Developers can plug QEdgeProxy into existing K3s or micro‑k8s clusters to guarantee inference latency for cameras, drones, or AR devices without redesigning their services.
SLA‑driven multi‑tenant platforms: Cloud‑edge providers can expose per‑tenant QoS contracts; QEdgeProxy enforces them automatically, reducing the need for manual traffic engineering.
Cost efficiency: By steering traffic to the most likely successful instance rather than the nearest one, operators can keep lower‑spec edge nodes online longer, saving on hardware and energy.
Zero‑touch scaling: The adaptive exploration eliminates the “cold‑start” problem common in RL‑based controllers, making QEdgeProxy suitable for highly dynamic IoT fleets where nodes join/leave frequently.

Limitations & Future Work

Assumes reliable QoS feedback: The approach needs accurate response‑time measurements; noisy timestamps (e.g., unsynchronized clocks) could degrade KDE estimates.
Scalability of KDE tables: While lightweight for a handful of instances, maintaining KDEs for hundreds of arms may increase memory usage; hierarchical or sketch‑based approximations are a possible remedy.
Limited to binary QoS success: Extending the model to multi‑dimensional SLAs (e.g., jitter, throughput) would broaden applicability.
Real‑world deployment study: The authors plan to evaluate QEdgeProxy on a production edge network (e.g., 5G MEC) to confirm robustness under real traffic patterns and heterogeneous hardware.

Authors

Ivan Čilić
Ivana Podnar Žarko
Pantelis Frangoudis
Schahram Dustdar

Paper Information

arXiv ID: 2512.18915v1
Categories: cs.NI, cs.DC
Published: December 21, 2025
PDF: Download PDF

[Paper] QoS-Aware Load Balancing in the Computing Continuum via Multi-Player Bandits

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Proceedings First Workshop on Adaptable Cloud Architectures

[Paper] FUSCO: High-Performance Distributed Data Shuffling via Transformation-Communication Fusion

[Paper] Robust Federated Fine-Tuning in Heterogeneous Networks with Unreliable Connections: An Aggregation View

[Paper] BLEST: Blazingly Efficient BFS using Tensor Cores