[Paper] QoS-Aware Load Balancing in the Computing Continuum via Multi-Player Bandits

Published: (December 21, 2025 at 06:18 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.18915v1

Overview

The paper introduces QEdgeProxy, a decentralized load‑balancing layer designed for the emerging Computing Continuum—the seamless blend of cloud, edge, and device‑level compute. By treating each load‑balancer as a player in a multi‑armed bandit game, QEdgeProxy can dynamically steer IoT traffic to the service instance most likely to meet each client’s latency and reliability targets, even as workloads and network conditions shift.

Key Contributions

  • QoS‑centric formulation: Casts load balancing in the continuum as a Multi‑Player Multi‑Armed Bandit (MP‑MAB) problem with heterogeneous rewards that directly model per‑client QoS success probabilities.
  • Kernel Density Estimation (KDE) for reward modeling: Uses KDE to estimate the distribution of observed response times, enabling a smooth probability estimate of meeting a client’s QoS deadline.
  • Adaptive exploration strategy: Introduces a lightweight, context‑aware exploration mechanism that quickly reacts to non‑stationary conditions (e.g., sudden load spikes or instance failures).
  • Kubernetes‑native implementation: Provides an open‑source QEdgeProxy that runs as a sidecar/proxy in K3s clusters, requiring no changes to existing services.
  • Empirical validation: Demonstrates on a realistic edge‑AI workload that QEdgeProxy outperforms simple proximity‑based routing and a state‑of‑the‑art reinforcement‑learning balancer in per‑client QoS satisfaction and adaptability.

Methodology

  1. Problem Modeling – Each edge proxy (player) must pick one of several service instances (arms) for each incoming request. The “reward” is binary: 1 if the request meets the client’s QoS deadline, 0 otherwise. Because different clients have different latency targets, rewards are heterogeneous across players.
  2. Reward Estimation with KDE – Instead of counting successes, QEdgeProxy builds a kernel‑density estimate of the observed response‑time distribution for each arm. The area under the curve that lies below the client’s deadline yields the estimated success probability.
  3. Decision Rule – Players use an Upper‑Confidence Bound (UCB) style rule that balances the estimated success probability with an exploration bonus. The bonus shrinks when the KDE becomes confident and expands when recent observations indicate a shift.
  4. Adaptation to Non‑Stationarity – A sliding window discards stale samples, and the exploration bonus is inflated after detecting a significant change in the estimated distribution (e.g., via a KL‑divergence test).
  5. Implementation – QEdgeProxy is packaged as a lightweight Go service that intercepts HTTP/gRPC traffic, queries the local KDE tables, and forwards the request to the chosen instance. It integrates with Kubernetes via a Custom Resource Definition (CRD) that declares QoS targets per client.

Results & Findings

BaselineAvg. per‑client QoS satisfactionAdaptation latency (to load surge)
Proximity‑based routing71 %45 s
RL‑based balancer (DQN)78 %30 s
QEdgeProxy92 %12 s
  • Higher QoS satisfaction: QEdgeProxy consistently kept latency‑sensitive requests under their deadlines, delivering a ~15 % improvement over the RL baseline.
  • Fast recovery: When a service instance failed or a sudden traffic burst hit the edge node, QEdgeProxy re‑allocated traffic within seconds, whereas the RL model required many more episodes to relearn.
  • Low overhead: The proxy added < 2 ms of processing latency per request, negligible compared to typical edge‑AI inference times (≈ 30 ms).

Practical Implications

  • Edge‑AI deployments: Developers can plug QEdgeProxy into existing K3s or micro‑k8s clusters to guarantee inference latency for cameras, drones, or AR devices without redesigning their services.
  • SLA‑driven multi‑tenant platforms: Cloud‑edge providers can expose per‑tenant QoS contracts; QEdgeProxy enforces them automatically, reducing the need for manual traffic engineering.
  • Cost efficiency: By steering traffic to the most likely successful instance rather than the nearest one, operators can keep lower‑spec edge nodes online longer, saving on hardware and energy.
  • Zero‑touch scaling: The adaptive exploration eliminates the “cold‑start” problem common in RL‑based controllers, making QEdgeProxy suitable for highly dynamic IoT fleets where nodes join/leave frequently.

Limitations & Future Work

  • Assumes reliable QoS feedback: The approach needs accurate response‑time measurements; noisy timestamps (e.g., unsynchronized clocks) could degrade KDE estimates.
  • Scalability of KDE tables: While lightweight for a handful of instances, maintaining KDEs for hundreds of arms may increase memory usage; hierarchical or sketch‑based approximations are a possible remedy.
  • Limited to binary QoS success: Extending the model to multi‑dimensional SLAs (e.g., jitter, throughput) would broaden applicability.
  • Real‑world deployment study: The authors plan to evaluate QEdgeProxy on a production edge network (e.g., 5G MEC) to confirm robustness under real traffic patterns and heterogeneous hardware.

Authors

  • Ivan Čilić
  • Ivana Podnar Žarko
  • Pantelis Frangoudis
  • Schahram Dustdar

Paper Information

  • arXiv ID: 2512.18915v1
  • Categories: cs.NI, cs.DC
  • Published: December 21, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »