[Paper] Deadline-Aware Online Scheduling for LLM Fine-Tuning with Spot Market Predictions

Published: 1 month ago (December 24, 2025 at 12:47 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.20967v1

Overview

Fine‑tuning massive foundation models (LLMs) is becoming a budget‑killer for many teams. This paper shows how to blend cheap, volatile GPU spot instances with reliable on‑demand VMs while respecting a user‑specified deadline. By forecasting spot‑market price and availability, the authors devise an online scheduler that cuts costs dramatically—up to ≈ 55 % better utility than existing heuristics—without sacrificing timeliness.

Key Contributions

Spot‑market predictability analysis: Empirical study demonstrating that short‑term spot prices and availability exhibit enough regularity to be forecasted with useful accuracy.
Mixed‑instance integer programming model: Formalizes the trade‑off between cost, deadline, and the stochastic nature of spot resources.
Prediction‑driven online algorithm (Commitment‑Level Control): Uses a “commitment level” to lock in a partial execution plan, adapting as new price/availability information arrives.
Robust fallback algorithm: A prediction‑free online scheduler that guarantees reasonable performance when forecasts are poor.
Meta‑policy selector: An online learning component that automatically picks the best algorithm from a pool of parameterized policies, achieving an (\mathcal{O}(\sqrt{T})) regret bound.
Extensive evaluation: Real‑world spot‑price traces from major cloud providers and realistic LLM fine‑tuning workloads, showing up to 54.8 % utility improvement over strong baselines.

Methodology

Data collection & forecasting – The authors scrape spot‑price and instance‑availability logs (e.g., AWS, GCP) and train lightweight time‑series models (ARIMA, exponential smoothing) to predict the next (k) hours.
Mathematical formulation – An integer program captures:
- Number of GPU hours needed for the fine‑tuning job,
- Deadline constraint,
- Cost = spot‑price × spot‑hours + on‑demand‑price × on‑demand‑hours,
- Availability constraints (spot instances may be reclaimed).
Online allocation with commitment level – At each decision epoch the scheduler:
- Takes the current forecast,
- Solves a relaxed version of the integer program to obtain a partial schedule,
- Commits to the first segment (the “commitment level”) while leaving later decisions flexible.
Prediction‑free fallback – When forecast error exceeds a threshold, the system switches to a greedy, deadline‑aware heuristic that only uses on‑demand resources if spot capacity is insufficient.
Policy selection via bandit learning – A multi‑armed bandit framework evaluates a portfolio of policies (different commitment levels, forecast horizons, fallback thresholds) and converges to the best‑performing one as the job progresses.

All steps run in seconds on a single CPU, making the approach practical for real‑time cloud orchestration.

Results & Findings

Metric	Baseline (pure on‑demand)	Spot‑only heuristic	Proposed online framework
Total cost (USD)	1.00× (reference)	0.68×	0.45×
Deadline miss rate	0% (by design)	12%	< 1%
Utility gain (cost‑vs‑deadline)	—	+22%	+54.8%
Sensitivity to forecast error	N/A	Degrades sharply	Graceful degradation; fallback kicks in

Key takeaways:

Even modestly accurate spot forecasts (MAE ≈ 5 %) enable the scheduler to lock in cheap resources early, saving > 30 % cost.
The commitment‑level mechanism prevents “over‑committing” to spot instances that later disappear, keeping deadline violations near zero.
The meta‑policy selector automatically adapts to market regimes (e.g., price spikes, high pre‑emptions) without manual tuning.

Practical Implications

Cost‑effective fine‑tuning pipelines: Teams can integrate the scheduler into existing ML orchestration tools (Kubeflow, Airflow) to automatically decide when to spin up spot GPUs versus on‑demand ones.
Budget‑constrained research labs: By guaranteeing deadlines, labs can run large‑scale experiments on a predictable budget, freeing up funds for additional research.
Cloud‑provider tooling: The methodology could be packaged as a SaaS offering or a plug‑in for cloud marketplaces, giving customers a “deadline‑aware spot optimizer” out‑of‑the‑box.
Generalizable to other workloads: Any GPU‑intensive, deadline‑sensitive job (e.g., video rendering, scientific simulations) can benefit from the same mixed‑instance, prediction‑driven approach.

Limitations & Future Work

Forecast horizon limited to a few hours: Longer‑term predictions become noisy; extending to multi‑day horizons may require richer models (e.g., LSTM, transformer‑based time series).
Spot‑market heterogeneity: The study focuses on a handful of major cloud providers; newer markets (e.g., pre‑emptible TPU, edge‑node spot pools) need separate validation.
Assumes static job size: Dynamic workloads where the required GPU hours evolve during training are not explicitly handled.
Potential regulatory/compliance constraints: Some enterprises restrict spot usage for security or data‑privacy reasons; integrating policy‑aware constraints is an open direction.

Future research could explore deep‑learning‑based price predictors, incorporate multi‑cloud arbitrage, and extend the integer program to handle elastic job graphs (e.g., pipeline stages with differing resource needs).

Authors

Linggao Kong
Yuedong Xu
Lei Jiao
Chuan Xu

Paper Information

arXiv ID: 2512.20967v1
Categories: cs.DC, cs.LG
Published: December 24, 2025
PDF: Download PDF

[Paper] Deadline-Aware Online Scheduling for LLM Fine-Tuning with Spot Market Predictions

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud Applications

[Paper] Pruning as a Game: Equilibrium-Driven Sparsification of Neural Networks

[Paper] Explainable Multimodal Regression via Information Decomposition

[Paper] A2P-Vis: an Analyzer-to-Presenter Agentic Pipeline for Visual Insights Generation and Reporting