[Paper] Hierarchical Online-Scheduling for Energy-Efficient Split Inference with Progressive Transmission

Published: (January 12, 2026 at 08:56 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.08135v1

Overview

The paper introduces ENACHI, a hierarchical online‑scheduling framework that lets edge devices and cloud/edge servers collaborate on deep‑neural‑network (DNN) inference while keeping energy use low and latency tight. By coordinating decisions at both the task level (where to split the model, how much bandwidth to reserve) and the packet level (how to transmit data over a noisy, time‑varying channel), ENACHI achieves higher accuracy than prior methods without blowing the device’s battery or missing deadlines.

Key Contributions

  • Two‑tier Lyapunov optimization that simultaneously handles long‑term energy‑accuracy trade‑offs (outer loop) and short‑term channel fluctuations (inner loop).
  • Progressive transmission mechanism that adaptively sends only the most informative parts of a feature map, reducing unnecessary data transfer.
  • Reference‑tracking power control that dynamically adjusts transmit power per slot to meet a pre‑computed energy budget while reacting to real‑time channel conditions.
  • Comprehensive evaluation on ImageNet showing up to 43 % higher accuracy and 62 % lower energy under tight latency constraints, plus stable performance in multi‑user, congested scenarios.

Methodology

  1. Task‑level scheduling (outer loop)

    • At each inference request, ENACHI decides where to split the DNN (device vs. edge) and how much bandwidth to allocate.
    • It uses a drift‑plus‑penalty formulation: the “drift” keeps the long‑term energy consumption close to a reference budget, while the “penalty” rewards higher inference accuracy.
  2. Packet‑level scheduling (inner loop)

    • Once a split point is chosen, the intermediate feature tensor is transmitted over a wireless link.
    • ENACHI applies uncertainty‑aware progressive transmission: the feature map is partitioned into packets ordered by their contribution to prediction confidence; packets are sent until the edge can make a confident decision.
    • A reference‑tracking controller continuously adjusts the transmit power for each packet, ensuring the instantaneous power stays near the budget set by the outer loop while coping with channel fading.
  3. Lyapunov‑based stability guarantee

    • The two loops are coupled through a Lyapunov function that proves the system remains stable (energy budget never diverges) while driving the accuracy metric upward.

Results & Findings

ScenarioAccuracy ↑ vs. baselinesEnergy ↓ vs. baselinesLatency impact
Tight deadline (≤ 30 ms) & limited bandwidth (2 Mbps)+43.12 %‑62.13 %Meets deadline consistently
Moderate deadline (≤ 100 ms)+21 %‑35 %Slight headroom for extra tasks
Multi‑user (10 concurrent devices)Stable (≤ 2 % variance)Energy per device unchangedNo extra queuing delay

Key takeaways

  • Progressive transmission cuts the transmitted data volume by up to 70 % for “easy” samples, while still sending extra packets for harder inputs.
  • The reference‑tracking power policy keeps the device’s average power within 5 % of the target budget, even when the channel SNR swings by >10 dB.
  • ENACHI scales gracefully: adding more users does not increase per‑device energy consumption, thanks to the shared bandwidth allocation logic in the outer loop.

Practical Implications

  • Edge AI developers can integrate ENACHI as a middleware layer that automatically decides the optimal split point and transmission schedule, freeing them from hand‑tuning per‑model or per‑network conditions.
  • Mobile and IoT manufacturers gain a concrete method to extend battery life while still offering real‑time AI services (e.g., AR, voice assistants) under variable Wi‑Fi/5G conditions.
  • Network operators can expose a lightweight API that reports current bandwidth and channel statistics; ENACHI consumes this information to allocate resources fairly among competing devices.
  • The progressive transmission concept can be repurposed for other bandwidth‑heavy tasks such as video analytics or federated learning, where early‑exit decisions reduce communication overhead.

Limitations & Future Work

  • The current design assumes perfect knowledge of channel statistics for the Lyapunov drift term; abrupt, non‑stationary interference could degrade performance.
  • ENACHI focuses on single‑model split inference; extending the framework to multi‑model pipelines (e.g., cascaded detectors) remains an open challenge.
  • Real‑world deployment would need hardware‑level integration (e.g., on‑chip power controllers) to fully exploit the reference‑tracking policy—future work could prototype this on edge ASICs or smartphones.

Bottom line: ENACHI demonstrates that a carefully orchestrated, hierarchical scheduling strategy can dramatically improve the energy‑efficiency and accuracy of device‑edge collaborative inference, paving the way for more responsive and battery‑friendly AI applications at the edge.

Authors

  • Zengzipeng Tang
  • Yuxuan Sun
  • Wei Chen
  • Jianwen Ding
  • Bo Ai
  • Yulin Shao

Paper Information

  • arXiv ID: 2601.08135v1
  • Categories: cs.NI, cs.DC, cs.LG
  • Published: January 13, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »