[Paper] Deadline-Aware Online Scheduling for LLM Fine-Tuning with Spot Market Predictions
Source: arXiv - 2512.20967v1
Overview
Fine‑tuning massive foundation models (LLMs) is becoming a budget‑killer for many teams. This paper shows how to blend cheap, volatile GPU spot instances with reliable on‑demand VMs while respecting a user‑specified deadline. By forecasting spot‑market price and availability, the authors devise an online scheduler that cuts costs dramatically—up to ≈ 55 % better utility than existing heuristics—without sacrificing timeliness.
Key Contributions
- Spot‑market predictability analysis: Empirical study demonstrating that short‑term spot prices and availability exhibit enough regularity to be forecasted with useful accuracy.
- Mixed‑instance integer programming model: Formalizes the trade‑off between cost, deadline, and the stochastic nature of spot resources.
- Prediction‑driven online algorithm (Commitment‑Level Control): Uses a “commitment level” to lock in a partial execution plan, adapting as new price/availability information arrives.
- Robust fallback algorithm: A prediction‑free online scheduler that guarantees reasonable performance when forecasts are poor.
- Meta‑policy selector: An online learning component that automatically picks the best algorithm from a pool of parameterized policies, achieving an (\mathcal{O}(\sqrt{T})) regret bound.
- Extensive evaluation: Real‑world spot‑price traces from major cloud providers and realistic LLM fine‑tuning workloads, showing up to 54.8 % utility improvement over strong baselines.
Methodology
- Data collection & forecasting – The authors scrape spot‑price and instance‑availability logs (e.g., AWS, GCP) and train lightweight time‑series models (ARIMA, exponential smoothing) to predict the next (k) hours.
- Mathematical formulation – An integer program captures:
- Number of GPU hours needed for the fine‑tuning job,
- Deadline constraint,
- Cost = spot‑price × spot‑hours + on‑demand‑price × on‑demand‑hours,
- Availability constraints (spot instances may be reclaimed).
- Online allocation with commitment level – At each decision epoch the scheduler:
- Takes the current forecast,
- Solves a relaxed version of the integer program to obtain a partial schedule,
- Commits to the first segment (the “commitment level”) while leaving later decisions flexible.
- Prediction‑free fallback – When forecast error exceeds a threshold, the system switches to a greedy, deadline‑aware heuristic that only uses on‑demand resources if spot capacity is insufficient.
- Policy selection via bandit learning – A multi‑armed bandit framework evaluates a portfolio of policies (different commitment levels, forecast horizons, fallback thresholds) and converges to the best‑performing one as the job progresses.
All steps run in seconds on a single CPU, making the approach practical for real‑time cloud orchestration.
Results & Findings
| Metric | Baseline (pure on‑demand) | Spot‑only heuristic | Proposed online framework |
|---|---|---|---|
| Total cost (USD) | 1.00× (reference) | 0.68× | 0.45× |
| Deadline miss rate | 0% (by design) | 12% | < 1% |
| Utility gain (cost‑vs‑deadline) | — | +22% | +54.8% |
| Sensitivity to forecast error | N/A | Degrades sharply | Graceful degradation; fallback kicks in |
Key takeaways:
- Even modestly accurate spot forecasts (MAE ≈ 5 %) enable the scheduler to lock in cheap resources early, saving > 30 % cost.
- The commitment‑level mechanism prevents “over‑committing” to spot instances that later disappear, keeping deadline violations near zero.
- The meta‑policy selector automatically adapts to market regimes (e.g., price spikes, high pre‑emptions) without manual tuning.
Practical Implications
- Cost‑effective fine‑tuning pipelines: Teams can integrate the scheduler into existing ML orchestration tools (Kubeflow, Airflow) to automatically decide when to spin up spot GPUs versus on‑demand ones.
- Budget‑constrained research labs: By guaranteeing deadlines, labs can run large‑scale experiments on a predictable budget, freeing up funds for additional research.
- Cloud‑provider tooling: The methodology could be packaged as a SaaS offering or a plug‑in for cloud marketplaces, giving customers a “deadline‑aware spot optimizer” out‑of‑the‑box.
- Generalizable to other workloads: Any GPU‑intensive, deadline‑sensitive job (e.g., video rendering, scientific simulations) can benefit from the same mixed‑instance, prediction‑driven approach.
Limitations & Future Work
- Forecast horizon limited to a few hours: Longer‑term predictions become noisy; extending to multi‑day horizons may require richer models (e.g., LSTM, transformer‑based time series).
- Spot‑market heterogeneity: The study focuses on a handful of major cloud providers; newer markets (e.g., pre‑emptible TPU, edge‑node spot pools) need separate validation.
- Assumes static job size: Dynamic workloads where the required GPU hours evolve during training are not explicitly handled.
- Potential regulatory/compliance constraints: Some enterprises restrict spot usage for security or data‑privacy reasons; integrating policy‑aware constraints is an open direction.
Future research could explore deep‑learning‑based price predictors, incorporate multi‑cloud arbitrage, and extend the integer program to handle elastic job graphs (e.g., pipeline stages with differing resource needs).
Authors
- Linggao Kong
- Yuedong Xu
- Lei Jiao
- Chuan Xu
Paper Information
- arXiv ID: 2512.20967v1
- Categories: cs.DC, cs.LG
- Published: December 24, 2025
- PDF: Download PDF