[Paper] Metronome: Differentiated Delay Scheduling for Serverless Functions

Published: 2 months ago (December 5, 2025 at 08:30 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2512.05703v1

Overview

Serverless platforms (FaaS) promise “pay‑as‑you‑go” compute without the headache of provisioning servers, but the underlying scheduler still struggles to place functions efficiently. The paper Metronome: Differentiated Delay Scheduling for Serverless Functions investigates why classic delay‑scheduling tricks from Hadoop‑style clusters don’t translate directly to serverless workloads, and introduces a machine‑learning‑driven scheduler that adapts the delay per‑function to improve latency while respecting SLAs.

Key Contributions

Empirical study of delay scheduling in serverless – systematic evaluation of existing techniques reveals three non‑obvious properties of serverless workloads (input‑dependent locality benefits, dual data + infrastructure locality, and heterogeneous execution times).
Differentiated delay concept – instead of a one‑size‑fits‑all delay threshold, Metronome decides per‑function how long to wait for a “better” node.
Online Random Forest regression model – predicts a function’s execution time on each candidate node using lightweight runtime features, enabling the scheduler to pick the node that minimizes overall latency.
SLA‑aware delay control – the model feeds a safety margin that guarantees the chosen delay never pushes the function past its deadline.
Prototype on OpenLambda – real‑world experiments show mean execution‑time reductions of 64.9 %–95.8 % versus vanilla OpenLambda and other baselines, even under high concurrency.

Methodology

Workload Characterization – the authors run a diverse set of benchmark functions (CPU‑bound, I/O‑bound, data‑intensive) on OpenLambda and record where latency improvements appear when a function waits for a “local” node.
Observation Extraction – three patterns emerge:
- (a) locality gains depend on input size,
- (b) “local” can mean data stored on the same storage node or the same physical host (infrastructure locality), and
- (c) execution times vary wildly, making static delay thresholds ineffective.
Predictive Scheduler Design – Metronome maintains an online Random Forest model that ingests:
- Function metadata (size of input, language runtime, memory request)
- Node state (CPU load, network bandwidth, cached data presence)
- Historical execution times on that node
  The model outputs an estimated runtime for each candidate node.
Delay Decision Logic – for a newly arrived function, Metronome:
- Computes the expected finish time on the currently available node.
- Checks whether waiting for a “better” node (according to the model) would finish earlier and stay within the SLA.
- If yes, it delays the dispatch; otherwise it schedules immediately.
Implementation & Evaluation – integrated into OpenLambda’s dispatcher, the system is tested with varying concurrency levels (up to 500 simultaneous invocations) and compared against:
- No‑delay (immediate dispatch)
- Fixed‑threshold delay scheduling (classic approach)
- Random placement baseline

Results & Findings

Metric	Metronome	Fixed‑Delay	Immediate	Random
Mean execution time reduction (vs. Immediate)	64.9 % – 95.8 %	12 % – 28 %	–	5 % – 10 %
SLA violation rate	0 % (within 95 % confidence)	3 % – 7 %	0 % (but higher latency)	0 % (but much slower)
Throughput under high concurrency (500 req/s)	+22 % over Fixed‑Delay	–	–	–
Overhead (model inference + bookkeeping)	~2 ms per request	negligible	–	–

Key takeaways

Predictive delay beats static thresholds because it tailors the wait time to each function’s characteristics and the current cluster state.
Dual locality matters – caching data and co‑locating on the same physical host can cut network latency dramatically for data‑heavy functions.
SLA safety is preserved – the model’s confidence interval is used to bound the maximum allowed delay, preventing missed deadlines.

Practical Implications

Serverless providers can embed a lightweight predictor in their dispatchers to squeeze out latency gains without adding hardware.
Developers can benefit from lower cold‑start times for data‑intensive functions simply by enabling the platform’s “locality‑aware” mode; no code changes are required.
Edge‑oriented FaaS (e.g., Cloudflare Workers, AWS Lambda@Edge) can adopt the differentiated delay idea to decide whether to run a function on a nearby edge node or fall back to a central data center, balancing latency vs. resource availability.
Cost optimization – shorter execution times translate directly into lower billable compute seconds, especially for high‑frequency micro‑services.
Observability tooling – the same runtime features used for prediction (input size, cache hit/miss) are valuable signals for debugging and capacity planning.

Limitations & Future Work

Model freshness – Random Forests are retrained periodically; rapid workload shifts could temporarily degrade prediction accuracy.
Feature set is platform‑specific; porting Metronome to other FaaS stacks (AWS Lambda, Azure Functions) will require re‑engineering the telemetry collection.
Scalability of the predictor – while inference is cheap, maintaining per‑node models at massive scale (tens of thousands of nodes) may need hierarchical or federated learning approaches.
Security & multi‑tenant isolation – exposing node‑level load information to the scheduler could raise isolation concerns; future designs must ensure no cross‑tenant leakage.
The authors suggest exploring reinforcement‑learning schedulers that can directly optimize a composite objective (latency + cost + SLA) and investigating heterogeneous hardware (GPU/FPGA) where locality decisions become even more nuanced.

Authors

Zhuangbin Chen
Juzheng Zheng
Zibin Zheng

Paper Information

arXiv ID: 2512.05703v1
Categories: cs.SE, cs.DC
Published: December 5, 2025
PDF: Download PDF

[Paper] Metronome: Differentiated Delay Scheduling for Serverless Functions

Overview

Key Contributions

Methodology

Results & Findings

Key takeaways

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Are Bus-Mounted Edge Servers Feasible?

[Paper] Compiler-supported reduced precision and AoS-SoA transformations for heterogeneous hardware

[Paper] FedGMR: Federated Learning with Gradual Model Restoration under Asynchrony and Model Heterogeneity

[Paper] NVLang: Unified Static Typing for Actor-Based Concurrency on the BEAM