[Paper] HiDVFS: A Hierarchical Multi-Agent DVFS Scheduler for OpenMP DAG Workloads

Published: (January 9, 2026 at 11:42 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2601.06425v1

Overview

The paper introduces HiDVFS, a hierarchical, multi‑agent scheduler that dynamically adjusts voltage and frequency on a per‑core basis for OpenMP programs represented as directed‑acyclic graphs (DAGs). By combining runtime profiling, temperature sensing, and reinforcement‑learning‑style rewards, HiDVFS dramatically cuts execution time and energy on an embedded multicore platform, making it a compelling solution for developers building performance‑critical, power‑constrained applications.

Key Contributions

  • Hierarchical multi‑agent architecture: three cooperating agents (core‑frequency selector, temperature manager, and task‑priority arbiter) coordinate to keep cores cool while maximizing throughput.
  • Makespan‑first reward function: a reinforcement‑learning‑inspired objective that prioritizes overall execution time but includes regularizers for energy and temperature, improving sample efficiency.
  • Profiling‑driven task allocation: uses lightweight offline profiling data to predict irregular execution patterns of OpenMP DAG workloads, avoiding naïve static core assignments.
  • Per‑core DVFS control: unlike many heuristics that set a single frequency for the whole chip, HiDVFS continuously monitors each core’s temperature and adjusts its voltage/frequency independently.
  • Empirical validation on real hardware: extensive experiments on the NVIDIA Jetson TX2 with the BOTS benchmark suite show up to 3.95× speedup and ≈47 % energy reduction versus the best prior DVFS scheduler (GearDVFS).

Methodology

  1. Workload Model – The authors focus on OpenMP programs that can be expressed as DAGs, where nodes are compute tasks and edges encode dependencies.

  2. Profiling Phase – Before runtime, each benchmark is executed once to collect per‑task execution‑time statistics on each core frequency. This lightweight profile feeds the scheduler’s decision engine.

  3. Agent Design

    • Agent 1 (Core‑Frequency Selector): queries the profiler to pick the most suitable core‑frequency pair for the next ready task.
    • Agent 2 (Temperature Manager): reads on‑chip thermal sensors; if a core’s temperature exceeds a threshold, it nudges the frequency down or migrates tasks to cooler cores.
    • Agent 3 (Priority Arbiter): when multiple tasks compete for the same core, it assigns priorities based on estimated impact on the overall makespan.
  4. Reward Function – The scheduler receives a scalar reward after each scheduling decision:

    [ R = -\text{makespan} + \lambda_1 \times \text{energy_regularizer} + \lambda_2 \times \text{temp_regularizer} ]

    The makespan term dominates, ensuring performance‑first behavior, while the regularizers gently penalize high energy use or overheating.

  5. Learning Loop – Using a simple Q‑learning update (or policy gradient variant), the agents iteratively improve their policy across multiple runs (seeds 42, 123, 456) to converge on a schedule that balances the three objectives.

Results & Findings

MetricHiDVFS (average)GearDVFS (baseline)SpeedupEnergy Reduction
Makespan (s)4.16 ± 0.58 (L10)14.32 ± 2.613.44×
Total Energy (kJ)63.7128.4≈50 %
Across 9 BOTS benchmarks3.95× speedup, 47.1 % energy cut

Key take‑aways

  • Per‑core DVFS combined with temperature awareness prevents thermal throttling that would otherwise elongate the critical path.
  • The profiling‑guided allocation captures irregular task runtimes, avoiding the “one‑size‑fits‑all” pitfall of static heuristics.
  • The makespan‑first reward quickly converges, requiring far fewer training episodes than generic RL approaches, which is crucial for embedded systems with limited offline time.

Practical Implications

  • Embedded AI & Edge Computing – Devices like Jetson TX2, Raspberry Pi 4, or ARM‑based SoCs can integrate HiDVFS to squeeze more inference throughput out of the same silicon envelope without overheating.
  • Real‑time Systems – By guaranteeing a tighter makespan while keeping temperature in check, HiDVFS can be used in robotics, autonomous drones, or automotive ECUs where latency and thermal budgets are strict.
  • Developer Tooling – The profiling step can be automated via a simple wrapper around omp runs, making it feasible to embed HiDVFS into CI pipelines for performance regression testing.
  • Energy‑aware Scheduling APIs – The hierarchical agent design maps cleanly onto existing runtime libraries (e.g., OpenMP runtime, Intel TBB) that already expose task‑graph information, enabling incremental adoption without rewriting application code.

Limitations & Future Work

  • Profiling Overhead – The approach assumes a representative offline profiling run; workloads with highly data‑dependent variability may need repeated profiling.
  • Hardware Specificity – Experiments are limited to the Jetson TX2; extending to heterogeneous platforms (CPU + GPU + NPU) will require additional coordination mechanisms.
  • Scalability of Agents – With many‑core systems (≥64 cores), the three‑agent hierarchy may become a bottleneck; the authors suggest exploring decentralized or hierarchical‑RL extensions.
  • Security & Isolation – Dynamic frequency changes could affect timing‑side‑channel characteristics; future work could investigate safe DVFS policies for security‑sensitive contexts.

Overall, HiDVFS demonstrates that a carefully engineered, multi‑agent DVFS scheduler can deliver substantial performance and energy gains for modern OpenMP DAG workloads, offering a practical pathway for developers to harness fine‑grained power management on today’s multicore embedded platforms.

Authors

  • Mohammad Pivezhandi
  • Abusayeed Saifullah
  • Ali Jannesari

Paper Information

  • arXiv ID: 2601.06425v1
  • Categories: cs.DC, cs.AI
  • Published: January 10, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »