[Paper] Quantifying the Carbon Reduction of DAG Workloads: A Job Shop Scheduling Perspective
Source: arXiv - 2512.07799v1
Overview
The paper Quantifying the Carbon Reduction of DAG Workloads: A Job Shop Scheduling Perspective examines how data‑center schedulers can cut carbon emissions by exploiting the internal structure of batch jobs (e.g., video encoding pipelines) rather than treating each job as a single, indivisible task. By modeling these jobs as directed‑acyclic graphs (DAGs) and applying job‑shop scheduling techniques, the authors show that carbon‑aware scheduling can achieve up to 25 % lower emissions on average without sacrificing overall job completion time.
Key Contributions
- Dependency‑aware carbon scheduling: Introduces a formal model that captures task‑level dependencies and resource requirements within batch workloads.
- Flexible job‑shop formulation: Casts the carbon‑aware scheduling problem as a variant of the classic flexible job‑shop problem, enabling the use of powerful offline solvers to compute optimal (or near‑optimal) schedules.
- Upper‑bound analysis: Provides quantitative upper bounds on carbon and energy savings achievable by a dependency‑aware approach, using realistic workload traces and server heterogeneity.
- Trade‑off characterization: Demonstrates the three‑way tension among carbon emissions, total energy consumption, and makespan, showing how relaxing the makespan constraint can nearly double carbon savings.
- Insightful sensitivity study: Identifies key factors (job DAG topology, number of servers, heterogeneity) that drive the magnitude of carbon reductions.
Methodology
- Workload modeling: Each batch job is represented as a DAG where nodes are individual tasks (e.g., decode, filter, encode) and edges encode precedence constraints. Tasks have known resource profiles (CPU, GPU, memory) and execution times.
- Scheduling objective: The primary goal is to minimize the carbon footprint of the entire workload, defined as the integral of the instantaneous carbon intensity of the power grid multiplied by the power draw of the active servers. A secondary constraint is the makespan—the total time to finish all jobs.
- Flexible job‑shop formulation: The problem is mapped to a flexible job‑shop scheduling instance: each task can be processed on any compatible machine (server) from a pool, and the solver decides both when and where to run each task.
- Offline optimal solver: Using an exact mixed‑integer linear programming (MILP) solver, the authors compute the optimal schedule for a given workload snapshot, yielding an upper bound on what any practical online scheduler could achieve.
- Baselines for comparison:
- Makespan‑only baseline: Schedules tasks solely to minimize makespan, ignoring carbon intensity.
- Energy‑optimal baseline: Minimizes total energy consumption without regard to carbon intensity.
- Experimental setup: Realistic traces from video‑encoding pipelines and offline inference jobs are run on both homogeneous and heterogeneous server clusters, with carbon intensity profiles taken from public grid data (e.g., CAISO, EU markets).
Results & Findings
| Scenario | Carbon Reduction vs. Makespan‑Only | Energy Change vs. Energy‑Optimal | Makespan Impact |
|---|---|---|---|
| Homogeneous 8‑server cluster | ≈ 25 % average reduction | Slightly higher energy (≈ 3 %) | No increase (optimal makespan preserved) |
| Heterogeneous 12‑server cluster | ≈ 18 % reduction | Up to 5 % more energy | Same makespan |
| Relaxed makespan (2× optimal) | ≈ 45 % carbon cut | Energy rises modestly (≈ 6 %) | Makespan doubled (by design) |
Key takeaways
- Dependency awareness matters: By reordering tasks within a job (e.g., running low‑carbon‑intensity phases first), the scheduler can shift more work into greener windows without delaying overall completion.
- Makespan slack is powerful: Allowing even modest flexibility (e.g., 20 % longer makespan) yields noticeable carbon gains; doubling the makespan roughly doubles the carbon savings.
- Server heterogeneity introduces trade‑offs: In mixed CPU/GPU environments, carbon‑optimal schedules may select less energy‑efficient machines to exploit low‑carbon periods, leading to a small energy penalty.
- Job structure drives benefit: DAGs with many short, independent tasks (high parallelism) benefit more than long, linear pipelines.
Practical Implications
- Carbon‑aware batch orchestration: Cloud platforms (AWS Batch, Google Cloud Batch, Azure Batch) can integrate DAG‑level awareness into their scheduling APIs, offering developers a “green mode” that automatically reshapes task execution order.
- CI/CD pipelines for ML/Media: Teams running nightly video transcoding or large‑scale offline inference can achieve measurable emissions cuts by exposing task dependencies to the scheduler (e.g., via Airflow DAGs) and allowing slight deadline flexibility.
- Edge‑to‑cloud coordination: For workloads that span edge devices and central data centers, the model suggests moving compute‑intensive sub‑tasks to the cloud during low‑carbon periods while keeping latency‑critical steps local.
- SLA design: Service‑level agreements can incorporate carbon‑budget clauses alongside latency guarantees, giving operators a quantitative lever to negotiate trade‑offs.
- Tooling: The authors’ MILP formulation can be distilled into heuristic plugins for existing schedulers (Kubernetes, Slurm) that approximate the optimal schedule with low overhead, making the approach viable in production.
Limitations & Future Work
- Offline optimality vs. online reality: The study relies on an offline MILP solver with full knowledge of future carbon intensity and workload characteristics; real‑time schedulers must make decisions with partial forecasts.
- Scalability of exact solvers: MILP solves become computationally expensive for very large clusters (> 100 servers) or thousands of jobs, necessitating heuristic or learning‑based approximations.
- Energy vs. carbon trade‑off: In heterogeneous environments, carbon‑optimal schedules may increase total energy consumption; future work should explore multi‑objective formulations that balance both metrics.
- Dynamic workloads: The current evaluation uses static job batches; extending the model to handle continuous job arrivals and preemptions is an open challenge.
- Broader carbon sources: The paper assumes grid carbon intensity as the sole factor; incorporating renewable‑energy availability, on‑site generation, or carbon offsets could refine the scheduling decisions.
Authors
- Roozbeh Bostandoost
- Adam Lechowicz
- Walid A. Hanafy
- Prashant Shenoy
- Mohammad Hajiesmaili
Paper Information
- arXiv ID: 2512.07799v1
- Categories: cs.DC
- Published: December 8, 2025
- PDF: Download PDF