[Paper] Adaptive Requesting in Decentralized Edge Networks via Non-Stationary Bandits
Source: arXiv - 2601.08760v1
Overview
The paper tackles a core challenge in edge computing: how time‑sensitive devices (think IoT sensors, AR headsets, or autonomous drones) can keep their data fresh when they must request content through a set of edge access nodes (ANs) that they cannot directly monitor. By framing the problem as a non‑stationary multi‑armed bandit, the authors devise a strategy that lets each client adaptively pick the best AN despite changing network conditions and without any central coordination.
Key Contributions
- Decentralized request model: Formalizes a realistic scenario where clients independently select ANs without observing their states or other clients’ actions.
- Non‑stationary bandit formulation: Captures both abrupt (e.g., node failures) and gradual (e.g., load shifts) changes in the expected “age‑of‑information” (AoI) reduction reward.
- AGING BANDIT WITH ADAPTIVE RESET (ABAR): A novel algorithm that combines sliding‑window reward estimation with periodic “reset” checks to quickly detect and adapt to distribution shifts.
- Theoretical guarantees: Proves near‑optimal regret bounds (i.e., the cumulative AoI loss compared to an oracle) even under the coupled, history‑dependent reward process.
- Empirical validation: Simulation results on synthetic and realistic edge‑network traces demonstrate that ABAR outperforms classic stationary bandit methods and recent non‑stationary baselines.
Methodology
-
System abstraction
- Clients generate time‑sensitive requests.
- Access Nodes (ANs) act as gateways to the cloud/server; each AN has an unknown, time‑varying ability to reduce the request’s AoI.
- Reward = AoI reduction achieved when a client selects a particular AN.
-
Problem modeling
- Each client faces a multi‑armed bandit where each arm corresponds to an AN.
- The reward distribution for each arm is non‑stationary: it can jump (e.g., a node goes offline) or drift (e.g., congestion builds up).
- Because clients cannot see each other’s selections, the reward process becomes history‑dependent (the load on an AN affects its future performance).
-
Algorithm design – ABAR
- Adaptive windowing: For each arm, maintain a sliding window of recent rewards; the window size automatically shrinks when a change is suspected, allowing rapid re‑learning.
- Periodic reset monitoring: At regular intervals, the algorithm checks whether the estimated mean reward has deviated beyond a confidence threshold; if so, it resets the statistics for that arm.
- Upper Confidence Bound (UCB) selection: Within the current window, compute a UCB score that balances exploration (trying less‑used ANs) and exploitation (using the currently best‑estimated AN).
-
Analysis
- The authors derive a regret bound that scales with the number of change points and the magnitude of changes, showing that ABAR’s regret is only a logarithmic factor away from the optimal offline policy that knows all change times in advance.
Results & Findings
| Metric | Stationary UCB | Sliding‑Window UCB | EXP3.S (non‑stationary) | ABAR (proposed) |
|---|---|---|---|---|
| Average AoI reduction (synthetic) | 12 % | 18 % | 22 % | 31 % |
| Regret (cumulative AoI loss) | O(T) | O(T^0.75) | O(T^0.6) | O(T^0.5) |
| Adaptation latency after abrupt change | > 200 steps | ~120 steps | ~80 steps | ≈30 steps |
- Abrupt changes (e.g., an AN crashes) are detected within a few dozen request rounds, after which ABAR quickly re‑allocates clients to healthier ANs.
- Gradual drifts (e.g., slowly increasing load) are tracked by the adaptive window, preventing the algorithm from over‑reacting to noise.
- Simulations on a real‑world edge trace (mobile edge computing testbed) confirm a 15‑20 % improvement in AoI freshness over the best competing non‑stationary bandit methods.
Practical Implications
- Edge‑native SDKs: Developers can embed ABAR‑style request logic into client libraries, letting devices autonomously pick the best gateway without a central controller.
- Reduced back‑haul traffic: By keeping data fresher at the edge, fewer retransmissions or upstream queries are needed, saving bandwidth and latency.
- Robustness to network dynamics: The algorithm’s quick adaptation makes it suitable for highly mobile scenarios (vehicular networks, drones) where connectivity patterns shift constantly.
- Scalable to massive IoT deployments: Since each client runs a lightweight local bandit learner, the solution scales linearly with the number of devices, avoiding the bottleneck of a centralized scheduler.
- Potential integration with 5G/6G edge orchestration: Network operators could expose AN performance hints (e.g., via a lightweight API) that ABAR can consume, further accelerating convergence.
Limitations & Future Work
- Partial observability: The model assumes clients receive only the AoI reduction reward; richer feedback (e.g., latency, packet loss) could improve learning but is not explored.
- Coupled reward dynamics: While the analysis accounts for history‑dependent rewards, the theoretical bounds become looser as the number of clients grows large; tighter multi‑agent regret analyses are an open question.
- Real‑world deployment: The paper validates the approach via simulations and trace replay; a live field trial on an operational edge platform would be needed to assess overheads and integration challenges.
- Extension to hierarchical edge: Future work could consider multi‑tier edge hierarchies (edge‑cloud‑core) and how adaptive bandits can coordinate across layers.
Bottom line: The ABAR algorithm offers a practical, theoretically sound tool for developers building latency‑critical edge applications, enabling devices to stay fresh in the face of ever‑changing network conditions—without relying on heavyweight orchestration or perfect state information.
Authors
- Yi Zhuang
- Kun Yang
- Xingran Chen
Paper Information
- arXiv ID: 2601.08760v1
- Categories: cs.LG, cs.MA
- Published: January 13, 2026
- PDF: Download PDF