[Paper] Deep Reinforcement Learning-driven Edge Offloading for Latency-constrained XR pipelines
Source: arXiv - 2603.16823v1
Overview
Extended reality (XR) applications—think AR glasses, VR headsets, and mixed‑reality collaboration tools—must render frames within a few milliseconds to avoid motion sickness, all while running on battery‑limited wearables. This paper proposes a battery‑aware edge‑offloading framework that decides, in real time, whether a XR workload should be processed locally or sent to a nearby edge server. By using a lightweight deep‑reinforcement‑learning (DRL) controller, the system continuously balances latency constraints with battery consumption, delivering smoother user experiences without draining the device.
Key Contributions
- Joint latency‑energy model that captures motion‑to‑photon (MTP) latency, workload quality, and battery dynamics in a single decision‑making objective.
- Online DRL policy (≈ 0.5 ms inference cost) that adapts execution placement on‑the‑fly under varying network bandwidth and device power states.
- Battery‑life extension of up to 163 % compared with a latency‑optimal local‑only baseline, while keeping ≥ 90 % of frames within the MTP latency budget in stable networks.
- Robustness to network degradation: compliance stays above 80 % even when bandwidth is severely limited.
- Extensive experimental validation on a prototype XR pipeline (camera capture → SLAM → rendering) using commodity edge hardware and off‑the‑shelf headsets.
Methodology
- System Model – The XR pipeline is split into three stages: sensor capture, compute‑heavy perception (e.g., SLAM, AI‑based object detection), and rendering. Each stage can run locally or be offloaded to an edge node.
- Latency‑Energy Objective – The authors formulate a cost function that penalizes missed MTP deadlines and battery drain, weighted by user‑defined preferences (e.g., “favor battery” vs. “favor latency”).
- State Representation – The DRL agent observes a compact state vector: current battery level, recent MTP latency, estimated network throughput, and workload size.
- Action Space – Two actions: Local (process everything on device) or Offload (send compute‑intensive stages to edge).
- Learning Algorithm – A lightweight Deep Q‑Network (DQN) with a few fully‑connected layers is trained online using experience replay. The reward reflects the objective function, encouraging actions that keep latency under the 20 ms MTP threshold while preserving battery.
- Implementation – The policy runs on the XR device’s CPU (≈ 2 % utilization) and communicates with an edge server over Wi‑Fi or 5G. The edge node executes the offloaded workload in a containerized environment to keep latency predictable.
Results & Findings
| Scenario | Battery Lifetime (hrs) | MTP Compliance (%) |
|---|---|---|
| Local‑only (latency‑optimal) | 1.0 (baseline) | 95 |
| Proposed DRL‑offload (stable Wi‑Fi) | 2.63 (+163 %) | 92 |
| Proposed DRL‑offload (5 Mbps limit) | 2.1 | 84 |
| Heuristic offload (static rule) | 1.7 | 78 |
- Latency: The DRL policy keeps average MTP latency under 20 ms in > 90 % of frames when bandwidth ≥ 10 Mbps; degradation is graceful as bandwidth drops.
- Overhead: Policy inference adds < 0.5 ms per decision, negligible compared to the XR frame budget.
- Adaptivity: When the battery dips below 20 %, the agent automatically shifts to more local processing to avoid sudden shutdowns, demonstrating closed‑loop energy awareness.
Practical Implications
- Longer Field Sessions: AR/VR developers can ship devices that stay operational for 2–3 × longer without sacrificing interactive smoothness—critical for enterprise training, remote assistance, or gaming marathons.
- Network‑Aware Apps: The DRL controller can be embedded in SDKs (e.g., Unity, Unreal) to let apps automatically adapt to Wi‑Fi/5G fluctuations, reducing the need for manual QoS tuning.
- Edge‑First Architecture: Service providers can design lightweight edge functions (SLAM, AI inference) knowing that a smart offloading layer will keep latency guarantees, making edge compute a viable alternative to on‑device accelerators.
- Battery‑Centric UX Metrics: Product managers now have a concrete metric (battery‑latency trade‑off) to benchmark XR experiences, moving beyond “average FPS” or “peak power” alone.
Limitations & Future Work
- Simplified Action Space – The current binary decision (local vs. offload) does not explore partial offloading (e.g., offload only SLAM but render locally).
- Network Model – Experiments focus on Wi‑Fi and a single 5G slice; more heterogeneous networks (cellular handover, congested edge) could affect stability.
- Generalization – The DRL policy is trained on a specific XR pipeline; transferring to drastically different workloads (e.g., volumetric video) may require retraining or meta‑learning techniques.
- Security & Privacy – Offloading raw sensor data raises privacy concerns that the paper does not address; future work could integrate encrypted inference or on‑device preprocessing.
Overall, the paper demonstrates that a modest DRL‑based offloading engine can dramatically stretch battery life while keeping XR latency within human‑perceptible bounds, paving the way for more immersive, untethered experiences.
Authors
- Sourya Saha
- Saptarshi Debroy
Paper Information
- arXiv ID: 2603.16823v1
- Categories: cs.CV
- Published: March 17, 2026
- PDF: Download PDF