[Paper] Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study

Published: 3 days ago (February 26, 2026 at 03:49 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.22760v1

Overview

The paper explores a novel way to cut the carbon footprint—and cost—of pre‑training large language models (LLMs) by syncing compute jobs with renewable curtailment windows. By running GPU‑intensive training only when excess clean energy is available, the authors demonstrate that a 561 M‑parameter transformer can be trained across multiple data‑center sites with emissions reduced to a fraction of the usual baseline.

Key Contributions

Curtailment‑aware scheduling framework that dynamically switches between single‑site and federated multi‑site training based on real‑time renewable excess.
Prototype implementation using the Flower federated‑learning library to coordinate three geographically distributed GPU clusters.
Empirical evaluation showing that training quality (perplexity, loss convergence) is preserved while operational emissions drop to 5‑12 % of a traditional single‑site run.
Open‑source data pipeline that ingests public marginal carbon‑intensity traces to predict curtailment windows for multiple regions.

Methodology

Data‑driven curtailment detection – The authors pull hourly marginal carbon intensity data (e.g., from ENTSO‑E, CAISO) and flag periods where the intensity falls below a renewable‑dominant threshold, indicating surplus clean power.
Elastic training orchestration – A central scheduler monitors the curtailment signals for each site. When a site enters a curtailment window, it is added to the training pool; when the window closes, the site is gracefully removed.
Federated synchronization – While multiple sites are active, each runs a local copy of the model on its GPU cluster. After a configurable number of local steps, the sites exchange weight updates via Flower’s secure aggregation, effectively performing a distributed SGD step.
Fallback to single‑site mode – If only one site has excess power, the system continues training locally, avoiding idle time.
Evaluation metrics – Model convergence (loss, perplexity) is compared against a baseline that trains continuously on a single data‑center. Energy consumption and carbon emissions are estimated using the same marginal intensity data.

Results & Findings

Metric	Baseline (single‑site)	Curtailment‑aware (3‑site)
Final validation loss	1.84	1.86
Perplexity (test)	12.3	12.5
Total GPU‑hours	4,800	4,950 (≈ 1 % overhead)
CO₂‑equivalent emissions	1.0 × (baseline)	0.05‑0.12 × (5‑12 %)
Average training wall‑time	7 days	7.3 days

Key takeaways:

Model quality remains essentially unchanged despite the intermittent, distributed nature of the training.
Energy savings are dramatic because the system only consumes power when it is already being generated cleanly and at low marginal cost.
The communication overhead of federated synchronization is modest (≈ 1 % extra GPU‑hours).

Practical Implications

Cost reduction for AI teams – Many cloud providers already price excess renewable energy lower; aligning training jobs with those windows can slash electricity bills.
Sustainability certifications – Companies can claim “curtailment‑powered training” as a concrete, measurable ESG initiative, which is increasingly important for investors and customers.
Edge‑to‑cloud training pipelines – The elastic federated approach can be repurposed for scenarios where compute resources are sporadic (e.g., volunteer GPU networks, edge devices with solar panels).
Policy alignment – Grid operators seeking to reduce curtailment penalties could incentivize AI workloads, creating a win‑win market for clean‑energy utilization.

Limitations & Future Work

Dependence on accurate curtailment forecasts – Mis‑predicted windows can lead to idle GPUs or missed training steps; integrating more sophisticated weather and market models is a next step.
Scalability to multi‑billion‑parameter models – The study stops at 561 M parameters; larger models will stress network bandwidth and may need hierarchical aggregation strategies.
Geographic and regulatory constraints – Not all regions expose granular marginal carbon data, limiting the approach’s global applicability.
Security & privacy – While Flower provides secure aggregation, real‑world deployments will need hardened protocols to protect model IP during cross‑site weight exchanges.

Future research will explore adaptive learning‑rate schedules that react to the irregular training cadence, tighter integration with renewable‑energy market APIs, and extending the framework to support mixed‑precision training for even larger models.

Authors

Philipp Wiesner
Soeren Becker
Brett Cornick
Dominik Scheinert
Alexander Acker
Odej Kao

Paper Information

arXiv ID: 2602.22760v1
Categories: cs.DC, cs.AI
Published: February 26, 2026
PDF: Download PDF

[Paper] Distributed LLM Pretraining During Renewable Curtailment Windows: A Feasibility Study

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Model Agreement via Anchoring

[Paper] SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

[Paper] A Dataset is Worth 1 MB

[Paper] SOTAlign: Semi-Supervised Alignment of Unimodal Vision and Language Models via Optimal Transport