[Paper] Metronome: Differentiated Delay Scheduling for Serverless Functions

Published: (December 5, 2025 at 08:30 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2512.05703v1

Overview

Serverless platforms (FaaS) promise “pay‑as‑you‑go” compute without the headache of provisioning servers, but the underlying scheduler still struggles to place functions efficiently. The paper Metronome: Differentiated Delay Scheduling for Serverless Functions investigates why classic delay‑scheduling tricks from Hadoop‑style clusters don’t translate directly to serverless workloads, and introduces a machine‑learning‑driven scheduler that adapts the delay per‑function to improve latency while respecting SLAs.

Key Contributions

  • Empirical study of delay scheduling in serverless – systematic evaluation of existing techniques reveals three non‑obvious properties of serverless workloads (input‑dependent locality benefits, dual data + infrastructure locality, and heterogeneous execution times).
  • Differentiated delay concept – instead of a one‑size‑fits‑all delay threshold, Metronome decides per‑function how long to wait for a “better” node.
  • Online Random Forest regression model – predicts a function’s execution time on each candidate node using lightweight runtime features, enabling the scheduler to pick the node that minimizes overall latency.
  • SLA‑aware delay control – the model feeds a safety margin that guarantees the chosen delay never pushes the function past its deadline.
  • Prototype on OpenLambda – real‑world experiments show mean execution‑time reductions of 64.9 %–95.8 % versus vanilla OpenLambda and other baselines, even under high concurrency.

Methodology

  1. Workload Characterization – the authors run a diverse set of benchmark functions (CPU‑bound, I/O‑bound, data‑intensive) on OpenLambda and record where latency improvements appear when a function waits for a “local” node.
  2. Observation Extraction – three patterns emerge:
    • (a) locality gains depend on input size,
    • (b) “local” can mean data stored on the same storage node or the same physical host (infrastructure locality), and
    • (c) execution times vary wildly, making static delay thresholds ineffective.
  3. Predictive Scheduler Design – Metronome maintains an online Random Forest model that ingests:
    • Function metadata (size of input, language runtime, memory request)
    • Node state (CPU load, network bandwidth, cached data presence)
    • Historical execution times on that node
      The model outputs an estimated runtime for each candidate node.
  4. Delay Decision Logic – for a newly arrived function, Metronome:
    • Computes the expected finish time on the currently available node.
    • Checks whether waiting for a “better” node (according to the model) would finish earlier and stay within the SLA.
    • If yes, it delays the dispatch; otherwise it schedules immediately.
  5. Implementation & Evaluation – integrated into OpenLambda’s dispatcher, the system is tested with varying concurrency levels (up to 500 simultaneous invocations) and compared against:
    • No‑delay (immediate dispatch)
    • Fixed‑threshold delay scheduling (classic approach)
    • Random placement baseline

Results & Findings

MetricMetronomeFixed‑DelayImmediateRandom
Mean execution time reduction (vs. Immediate)64.9 % – 95.8 %12 % – 28 %5 % – 10 %
SLA violation rate0 % (within 95 % confidence)3 % – 7 %0 % (but higher latency)0 % (but much slower)
Throughput under high concurrency (500 req/s)+22 % over Fixed‑Delay
Overhead (model inference + bookkeeping)~2 ms per requestnegligible

Key takeaways

  • Predictive delay beats static thresholds because it tailors the wait time to each function’s characteristics and the current cluster state.
  • Dual locality matters – caching data and co‑locating on the same physical host can cut network latency dramatically for data‑heavy functions.
  • SLA safety is preserved – the model’s confidence interval is used to bound the maximum allowed delay, preventing missed deadlines.

Practical Implications

  • Serverless providers can embed a lightweight predictor in their dispatchers to squeeze out latency gains without adding hardware.
  • Developers can benefit from lower cold‑start times for data‑intensive functions simply by enabling the platform’s “locality‑aware” mode; no code changes are required.
  • Edge‑oriented FaaS (e.g., Cloudflare Workers, AWS Lambda@Edge) can adopt the differentiated delay idea to decide whether to run a function on a nearby edge node or fall back to a central data center, balancing latency vs. resource availability.
  • Cost optimization – shorter execution times translate directly into lower billable compute seconds, especially for high‑frequency micro‑services.
  • Observability tooling – the same runtime features used for prediction (input size, cache hit/miss) are valuable signals for debugging and capacity planning.

Limitations & Future Work

  • Model freshness – Random Forests are retrained periodically; rapid workload shifts could temporarily degrade prediction accuracy.
  • Feature set is platform‑specific; porting Metronome to other FaaS stacks (AWS Lambda, Azure Functions) will require re‑engineering the telemetry collection.
  • Scalability of the predictor – while inference is cheap, maintaining per‑node models at massive scale (tens of thousands of nodes) may need hierarchical or federated learning approaches.
  • Security & multi‑tenant isolation – exposing node‑level load information to the scheduler could raise isolation concerns; future designs must ensure no cross‑tenant leakage.
  • The authors suggest exploring reinforcement‑learning schedulers that can directly optimize a composite objective (latency + cost + SLA) and investigating heterogeneous hardware (GPU/FPGA) where locality decisions become even more nuanced.

Authors

  • Zhuangbin Chen
  • Juzheng Zheng
  • Zibin Zheng

Paper Information

  • arXiv ID: 2512.05703v1
  • Categories: cs.SE, cs.DC
  • Published: December 5, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »