[Paper] Studying the Effect of Schedule Preemption on Dynamic Task Graph Scheduling

Published: (February 2, 2026 at 11:08 PM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.03081v1

Overview

The paper investigates schedule preemption for dynamic task‑graph workloads—situations where a stream of inter‑dependent tasks must be assigned to a pool of compute resources on the fly. Instead of always sticking with the original allocation or constantly re‑optimizing everything, the authors propose a middle ground: only the most recent tasks are allowed to be reshuffled. Their Last‑K Preemption model shows that modest, controlled preemption can reap most of the performance benefits of full re‑scheduling while keeping fairness and overhead in check.

Key Contributions

  • Last‑K Preemption model: a novel, tunable policy that limits re‑scheduling to the K most recently submitted sub‑graphs.
  • Comprehensive empirical study across four workload families (synthetic, RIoTBench, WF‑Commons, adversarial) comparing three strategies:
    1. Non‑preemptive (no reshuffling)
    2. Full preemptive (re‑schedule everything each round)
    3. Partial preemptive (Last‑K)
  • Multi‑metric evaluation: makespan, fairness (per‑task slowdown), resource utilization, and scheduler runtime overhead.
  • Design guidelines for choosing K based on workload volatility and fairness requirements.
  • Open‑source implementation and benchmark suite for reproducibility.

Methodology

  1. Task‑graph model – Each incoming job is a directed acyclic graph (DAG) of tasks with CPU/IO requirements and precedence constraints.
  2. Scheduling rounds – The system periodically collects all ready tasks and runs a heuristic (e.g., list‑scheduling) to map them onto a fixed set of homogeneous workers.
  3. Preemption policies
    • Non‑preemptive: once a task is placed, its assignment never changes.
    • Full preemptive: before each round, the scheduler discards all previous placements and recomputes a fresh schedule.
    • Last‑K: only the tasks belonging to the most recent K DAGs are eligible for relocation; older allocations are frozen.
  4. Workload generators
    • Synthetic: parametrized DAGs with controllable depth/width.
    • RIoTBench: realistic IoT stream processing graphs.
    • WF‑Commons: scientific workflow traces (e.g., Montage, Epigenomics).
    • Adversarial: crafted to stress fairness (e.g., many small, latency‑sensitive DAGs mixed with large batch jobs).
  5. Metrics
    • Makespan (total elapsed time)
    • Fairness index (Jain’s fairness over per‑job slowdown)
    • Utilization (fraction of CPU cycles spent on useful work)
    • Scheduler overhead (wall‑clock time spent in the scheduling algorithm)

All experiments run on a 16‑node cluster (each node 8 cores) with the same baseline heuristic; only the preemption policy varies.

Results & Findings

MetricNon‑preemptiveFull preemptiveLast‑K (K≈10 % of active DAGs)
Makespan reductionBaseline ‑12 % to ‑25 % ‑10 % to ‑22 %
Fairness (Jain index) 0.96 0.78 (significant slowdown for small jobs) 0.93
Utilization 71 % 84 % 82 %
Scheduler runtime < 0.5 s per round 2.8 s per round (≈5× overhead) 0.9 s per round

Key takeaways

  • Partial preemption captures > 80 % of the makespan gain of full preemption while keeping fairness almost as high as the non‑preemptive baseline.
  • Overhead scales roughly linearly with the number of tasks considered for reshuffling; limiting to the last K dramatically reduces scheduling time.
  • Workload volatility matters: for highly dynamic streams (RIoTBench), a larger K (≈15 % of active DAGs) yields the best trade‑off; for stable scientific workflows, even K = 5 % suffices.
  • Adversarial mixes expose the fairness pitfall of full preemption—small, latency‑critical jobs get repeatedly displaced—while Last‑K protects them by “locking in” earlier allocations.

Practical Implications

  • Cloud‑native batch/stream processing platforms (e.g., Apache Flink, Spark Structured Streaming) can adopt a Last‑K style “partial rebalancing” hook to improve throughput without sacrificing latency guarantees for short‑lived jobs.
  • Edge‑computing orchestrators that run heterogeneous IoT pipelines can limit the reshuffling window to the most recent sensor bursts, achieving higher CPU utilization on constrained devices.
  • Scheduler designers gain a concrete knob (K) to balance three competing goals: speed (makespan), fairness, and control‑plane overhead. The paper’s guidelines help set K based on observed job inter‑arrival patterns.
  • Service‑level agreements (SLAs) that penalize jitter can benefit: by freezing older allocations, the system reduces the risk of “task migration storms” that would otherwise cause temporary resource starvation.

Implementing Last‑K is straightforward: maintain a FIFO queue of active DAG identifiers, and when a new scheduling round starts, only extract tasks from the tail of the queue for the heuristic; the rest are passed through unchanged.

Limitations & Future Work

  • Homogeneous resources: Experiments assume identical workers; heterogeneous CPU/GPU or memory‑constrained nodes may affect the optimal K.
  • Single heuristic: The study uses a classic list‑scheduling heuristic; exploring more sophisticated solvers (ILP, reinforcement‑learning) under the Last‑K constraint is an open direction.
  • Static K: The paper selects a fixed K per workload; adaptive schemes that tune K at runtime based on observed volatility could further improve the trade‑off.
  • Network‑aware scheduling: Data transfer costs are abstracted away; integrating bandwidth considerations could change the fairness/utilization balance.

Overall, the research offers a pragmatic middle path for dynamic DAG schedulers, showing that controlled, limited preemption can deliver near‑optimal performance with manageable complexity—an insight that many production systems can start leveraging today.

Authors

  • Mohammadali Khodabandehlou
  • Jared Coleman
  • Niranjan Suri
  • Bhaskar Krishnamachari

Paper Information

  • arXiv ID: 2602.03081v1
  • Categories: cs.DC
  • Published: February 3, 2026
  • PDF: Download PDF
Back to Blog

Related posts

Read more »