[Paper] Adaptive Requesting in Decentralized Edge Networks via Non-Stationary Bandits

Published: 3 weeks ago (January 13, 2026 at 12:43 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2601.08760v1

Overview

The paper tackles a core challenge in edge computing: how time‑sensitive devices (think IoT sensors, AR headsets, or autonomous drones) can keep their data fresh when they must request content through a set of edge access nodes (ANs) that they cannot directly monitor. By framing the problem as a non‑stationary multi‑armed bandit, the authors devise a strategy that lets each client adaptively pick the best AN despite changing network conditions and without any central coordination.

Key Contributions

Decentralized request model: Formalizes a realistic scenario where clients independently select ANs without observing their states or other clients’ actions.
Non‑stationary bandit formulation: Captures both abrupt (e.g., node failures) and gradual (e.g., load shifts) changes in the expected “age‑of‑information” (AoI) reduction reward.
AGING BANDIT WITH ADAPTIVE RESET (ABAR): A novel algorithm that combines sliding‑window reward estimation with periodic “reset” checks to quickly detect and adapt to distribution shifts.
Theoretical guarantees: Proves near‑optimal regret bounds (i.e., the cumulative AoI loss compared to an oracle) even under the coupled, history‑dependent reward process.
Empirical validation: Simulation results on synthetic and realistic edge‑network traces demonstrate that ABAR outperforms classic stationary bandit methods and recent non‑stationary baselines.

Methodology

System abstraction
- Clients generate time‑sensitive requests.
- Access Nodes (ANs) act as gateways to the cloud/server; each AN has an unknown, time‑varying ability to reduce the request’s AoI.
- Reward = AoI reduction achieved when a client selects a particular AN.
Problem modeling
- Each client faces a multi‑armed bandit where each arm corresponds to an AN.
- The reward distribution for each arm is non‑stationary: it can jump (e.g., a node goes offline) or drift (e.g., congestion builds up).
- Because clients cannot see each other’s selections, the reward process becomes history‑dependent (the load on an AN affects its future performance).
Algorithm design – ABAR
- Adaptive windowing: For each arm, maintain a sliding window of recent rewards; the window size automatically shrinks when a change is suspected, allowing rapid re‑learning.
- Periodic reset monitoring: At regular intervals, the algorithm checks whether the estimated mean reward has deviated beyond a confidence threshold; if so, it resets the statistics for that arm.
- Upper Confidence Bound (UCB) selection: Within the current window, compute a UCB score that balances exploration (trying less‑used ANs) and exploitation (using the currently best‑estimated AN).
Analysis
- The authors derive a regret bound that scales with the number of change points and the magnitude of changes, showing that ABAR’s regret is only a logarithmic factor away from the optimal offline policy that knows all change times in advance.

Results & Findings

Metric	Stationary UCB	Sliding‑Window UCB	EXP3.S (non‑stationary)	ABAR (proposed)
Average AoI reduction (synthetic)	12 %	18 %	22 %	31 %
Regret (cumulative AoI loss)	O(T)	O(T^0.75)	O(T^0.6)	O(T^0.5)
Adaptation latency after abrupt change	> 200 steps	~120 steps	~80 steps	≈30 steps

Abrupt changes (e.g., an AN crashes) are detected within a few dozen request rounds, after which ABAR quickly re‑allocates clients to healthier ANs.
Gradual drifts (e.g., slowly increasing load) are tracked by the adaptive window, preventing the algorithm from over‑reacting to noise.
Simulations on a real‑world edge trace (mobile edge computing testbed) confirm a 15‑20 % improvement in AoI freshness over the best competing non‑stationary bandit methods.

Practical Implications

Edge‑native SDKs: Developers can embed ABAR‑style request logic into client libraries, letting devices autonomously pick the best gateway without a central controller.
Reduced back‑haul traffic: By keeping data fresher at the edge, fewer retransmissions or upstream queries are needed, saving bandwidth and latency.
Robustness to network dynamics: The algorithm’s quick adaptation makes it suitable for highly mobile scenarios (vehicular networks, drones) where connectivity patterns shift constantly.
Scalable to massive IoT deployments: Since each client runs a lightweight local bandit learner, the solution scales linearly with the number of devices, avoiding the bottleneck of a centralized scheduler.
Potential integration with 5G/6G edge orchestration: Network operators could expose AN performance hints (e.g., via a lightweight API) that ABAR can consume, further accelerating convergence.

Limitations & Future Work

Partial observability: The model assumes clients receive only the AoI reduction reward; richer feedback (e.g., latency, packet loss) could improve learning but is not explored.
Coupled reward dynamics: While the analysis accounts for history‑dependent rewards, the theoretical bounds become looser as the number of clients grows large; tighter multi‑agent regret analyses are an open question.
Real‑world deployment: The paper validates the approach via simulations and trace replay; a live field trial on an operational edge platform would be needed to assess overheads and integration challenges.
Extension to hierarchical edge: Future work could consider multi‑tier edge hierarchies (edge‑cloud‑core) and how adaptive bandits can coordinate across layers.

Bottom line: The ABAR algorithm offers a practical, theoretically sound tool for developers building latency‑critical edge applications, enabling devices to stay fresh in the face of ever‑changing network conditions—without relying on heavyweight orchestration or perfect state information.

Authors

Yi Zhuang
Kun Yang
Xingran Chen

Paper Information

arXiv ID: 2601.08760v1
Categories: cs.LG, cs.MA
Published: January 13, 2026
PDF: Download PDF

[Paper] Adaptive Requesting in Decentralized Edge Networks via Non-Stationary Bandits

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Do explanations generalize across large reasoning models?

[Paper] Building Production-Ready Probes For Gemini

[Paper] ShapeR: Robust Conditional 3D Shape Generation from Casual Captures

[Paper] MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management