[Paper] Green or Fast? Learning to Balance Cold Starts and Idle Carbon in Serverless Computing

Published: (February 27, 2026 at 06:35 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.23935v1

Overview

The paper tackles a core tension in serverless platforms: keeping function instances “warm” to avoid costly cold‑starts versus releasing them to cut idle power use and associated carbon emissions. By treating the keep‑alive decision as a sequential learning problem, the authors introduce LACE‑RL, a reinforcement‑learning controller that dynamically adjusts warm‑up times based on real‑time workload and grid carbon intensity. Their results show dramatic reductions in both latency spikes and carbon waste, making serverless more sustainable without sacrificing performance.

Key Contributions

  • LACE‑RL framework: a deep reinforcement‑learning (RL) controller that jointly optimizes latency (cold‑start avoidance) and carbon impact (idle emissions).
  • Latency‑aware carbon model: integrates per‑function cold‑start probabilities, function‑specific latency penalties, and real‑time grid carbon intensity into a single reward signal.
  • Dynamic keep‑alive policy: replaces static, one‑size‑fits‑all keep‑alive timers with per‑function, time‑varying decisions learned online.
  • Extensive evaluation: uses the Huawei Public Cloud trace (real‑world workload + regional carbon data) to benchmark against Huawei’s static policy, heuristic baselines, and single‑objective RL approaches.
  • Near‑optimal trade‑off: achieves performance close to an oracle that knows future workload, while cutting cold‑starts by ≈ 52 % and idle carbon by ≈ 77 % relative to the static baseline.

Methodology

  1. Problem formulation – The decision of how long to keep a function pod alive after it finishes is modeled as a Markov Decision Process (MDP). The state captures recent request rates, the current carbon intensity of the power grid, and the function’s historical cold‑start latency.
  2. Reward design – The reward penalizes both latency (cold‑start delay weighted by a service‑level objective) and carbon emissions (kWh × grid carbon intensity) incurred while a pod stays idle. A tunable coefficient lets operators prioritize one objective over the other.
  3. Deep RL agent – A dueling DQN (Deep Q‑Network) with experience replay learns a policy that maps states to a keep‑alive duration (discrete action space). The network is trained online using the live trace, allowing it to adapt to diurnal workload and carbon‑intensity patterns.
  4. Baseline comparisons – The authors implement Huawei’s static keep‑alive timer, a heuristic that scales the timer with request rate, and single‑objective RL agents (latency‑only, carbon‑only).
  5. Evaluation metrics – Cold‑start frequency, average request latency, idle‑time carbon emissions, and a combined latency‑carbon trade‑off score are reported.

Results & Findings

MetricHuawei static policyHeuristicLatency‑only RLCarbon‑only RLLACE‑RL
Cold‑starts (↓)100 % (baseline)–22 %–41 %–35 %‑52 %
Idle carbon (kWh)100 % (baseline)–45 %–60 %–71 %‑77 %
Avg. latency (ms)120115108112104
Combined score (higher is better)0.00.120.180.160.23
  • Cold‑starts drop by more than half, directly improving user‑perceived latency.
  • Idle carbon falls by three‑quarters, showing that the RL agent aggressively releases pods when the grid is dirty or workload is low.
  • The latency‑carbon trade‑off curve of LACE‑RL dominates all baselines and sits within 5 % of an oracle that knows future requests, confirming near‑optimal decision making.

Practical Implications

  • Serverless providers can embed LACE‑RL (or a similar RL controller) into their orchestration layer to automatically adapt keep‑alive timers per function, reducing operational costs and carbon footprints without manual tuning.
  • DevOps teams gain a knob (the latency‑vs‑carbon weight) to align platform behavior with corporate sustainability goals or SLA requirements.
  • Edge and hybrid cloud deployments—where carbon intensity can swing dramatically—benefit especially from dynamic policies that react to real‑time grid data.
  • Cost modeling: lower idle power translates to measurable savings on cloud bills, while fewer cold‑starts improve end‑user experience, potentially increasing adoption of serverless architectures.

Limitations & Future Work

  • Training data dependency: LACE‑RL relies on historical request traces and accurate, timely grid carbon intensity feeds; noisy or delayed data could degrade performance.
  • Scalability of the RL agent: The current implementation uses a single global model; scaling to thousands of functions with heterogeneous characteristics may require hierarchical or federated learning approaches.
  • Policy interpretability: Deep RL policies are opaque, making it hard for operators to audit decisions—future work could explore explainable RL or rule‑extraction techniques.
  • Generalization across clouds: Experiments are limited to Huawei’s public cloud trace; validating on other providers (AWS, Azure, GCP) and on multi‑region workloads is an open step.

Overall, the paper demonstrates that intelligent, data‑driven keep‑alive management can reconcile the competing goals of low latency and low carbon, paving the way for greener serverless computing.

Authors

  • Bowen Sun
  • Christos D. Antonopoulos
  • Evgenia Smirni
  • Bin Ren
  • Nikolaos Bellas
  • Spyros Lalis

Paper Information

  • arXiv ID: 2602.23935v1
  • Categories: cs.DC, cs.AI, cs.PF
  • Published: February 27, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »