[Paper] Studying the Effect of Schedule Preemption on Dynamic Task Graph Scheduling

Published: 5 days ago (February 2, 2026 at 11:08 PM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.03081v1

Overview

The paper investigates schedule preemption for dynamic task‑graph workloads—situations where a stream of inter‑dependent tasks must be assigned to a pool of compute resources on the fly. Instead of always sticking with the original allocation or constantly re‑optimizing everything, the authors propose a middle ground: only the most recent tasks are allowed to be reshuffled. Their Last‑K Preemption model shows that modest, controlled preemption can reap most of the performance benefits of full re‑scheduling while keeping fairness and overhead in check.

Key Contributions

Last‑K Preemption model: a novel, tunable policy that limits re‑scheduling to the K most recently submitted sub‑graphs.
Comprehensive empirical study across four workload families (synthetic, RIoTBench, WF‑Commons, adversarial) comparing three strategies:
1. Non‑preemptive (no reshuffling)
2. Full preemptive (re‑schedule everything each round)
3. Partial preemptive (Last‑K)
Multi‑metric evaluation: makespan, fairness (per‑task slowdown), resource utilization, and scheduler runtime overhead.
Design guidelines for choosing K based on workload volatility and fairness requirements.
Open‑source implementation and benchmark suite for reproducibility.

Methodology

Task‑graph model – Each incoming job is a directed acyclic graph (DAG) of tasks with CPU/IO requirements and precedence constraints.
Scheduling rounds – The system periodically collects all ready tasks and runs a heuristic (e.g., list‑scheduling) to map them onto a fixed set of homogeneous workers.
Preemption policies
- Non‑preemptive: once a task is placed, its assignment never changes.
- Full preemptive: before each round, the scheduler discards all previous placements and recomputes a fresh schedule.
- Last‑K: only the tasks belonging to the most recent K DAGs are eligible for relocation; older allocations are frozen.
Workload generators
- Synthetic: parametrized DAGs with controllable depth/width.
- RIoTBench: realistic IoT stream processing graphs.
- WF‑Commons: scientific workflow traces (e.g., Montage, Epigenomics).
- Adversarial: crafted to stress fairness (e.g., many small, latency‑sensitive DAGs mixed with large batch jobs).
Metrics
- Makespan (total elapsed time)
- Fairness index (Jain’s fairness over per‑job slowdown)
- Utilization (fraction of CPU cycles spent on useful work)
- Scheduler overhead (wall‑clock time spent in the scheduling algorithm)

All experiments run on a 16‑node cluster (each node 8 cores) with the same baseline heuristic; only the preemption policy varies.

Results & Findings

Metric	Non‑preemptive	Full preemptive	Last‑K (K≈10 % of active DAGs)
Makespan reduction	Baseline	‑12 % to ‑25 %	‑10 % to ‑22 %
Fairness (Jain index)	0.96	0.78 (significant slowdown for small jobs)	0.93
Utilization	71 %	84 %	82 %
Scheduler runtime	< 0.5 s per round	2.8 s per round (≈5× overhead)	0.9 s per round

Key takeaways

Partial preemption captures > 80 % of the makespan gain of full preemption while keeping fairness almost as high as the non‑preemptive baseline.
Overhead scales roughly linearly with the number of tasks considered for reshuffling; limiting to the last K dramatically reduces scheduling time.
Workload volatility matters: for highly dynamic streams (RIoTBench), a larger K (≈15 % of active DAGs) yields the best trade‑off; for stable scientific workflows, even K = 5 % suffices.
Adversarial mixes expose the fairness pitfall of full preemption—small, latency‑critical jobs get repeatedly displaced—while Last‑K protects them by “locking in” earlier allocations.

Practical Implications

Cloud‑native batch/stream processing platforms (e.g., Apache Flink, Spark Structured Streaming) can adopt a Last‑K style “partial rebalancing” hook to improve throughput without sacrificing latency guarantees for short‑lived jobs.
Edge‑computing orchestrators that run heterogeneous IoT pipelines can limit the reshuffling window to the most recent sensor bursts, achieving higher CPU utilization on constrained devices.
Scheduler designers gain a concrete knob (K) to balance three competing goals: speed (makespan), fairness, and control‑plane overhead. The paper’s guidelines help set K based on observed job inter‑arrival patterns.
Service‑level agreements (SLAs) that penalize jitter can benefit: by freezing older allocations, the system reduces the risk of “task migration storms” that would otherwise cause temporary resource starvation.

Implementing Last‑K is straightforward: maintain a FIFO queue of active DAG identifiers, and when a new scheduling round starts, only extract tasks from the tail of the queue for the heuristic; the rest are passed through unchanged.

Limitations & Future Work

Homogeneous resources: Experiments assume identical workers; heterogeneous CPU/GPU or memory‑constrained nodes may affect the optimal K.
Single heuristic: The study uses a classic list‑scheduling heuristic; exploring more sophisticated solvers (ILP, reinforcement‑learning) under the Last‑K constraint is an open direction.
Static K: The paper selects a fixed K per workload; adaptive schemes that tune K at runtime based on observed volatility could further improve the trade‑off.
Network‑aware scheduling: Data transfer costs are abstracted away; integrating bandwidth considerations could change the fairness/utilization balance.

Overall, the research offers a pragmatic middle path for dynamic DAG schedulers, showing that controlled, limited preemption can deliver near‑optimal performance with manageable complexity—an insight that many production systems can start leveraging today.

Authors

Mohammadali Khodabandehlou
Jared Coleman
Niranjan Suri
Bhaskar Krishnamachari

Paper Information

arXiv ID: 2602.03081v1
Categories: cs.DC
Published: February 3, 2026
PDF: Download PDF

[Paper] Studying the Effect of Schedule Preemption on Dynamic Task Graph Scheduling

Overview

Key Contributions

Methodology

Results & Findings

Key takeaways

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Location-Aware Dispersion on Anonymous Graphs

[Paper] The Quantum Message Complexity of Distributed Wake-Up with Advice

[Paper] Smoothed aggregation algebraic multigrid for problems with heterogeneous and anisotropic materials

[Paper] Emergence-as-Code for Self-Governing Reliable Systems