[Paper] New Kids: An Architecture and Performance Investigation of Second-Generation Serverless Platforms

Published: (April 17, 2026 at 06:17 AM EDT)
4 min read
Source: arXiv

Source: arXiv - 2604.15916v1

Overview

Serverless computing has become a staple for building scalable applications, yet the inner workings of the platforms that run our functions are often a black box. This paper uncovers a “second generation” of serverless platforms that replace heavyweight containers with ultra‑lightweight isolates and push execution closer to the edge. The authors dissect seven publicly documented platforms and run more than 38 million function invocations to quantify the performance shift, showing latency drops from ~40 ms to ~10 ms and practically eliminating cold‑start penalties.

Key Contributions

  • Taxonomy of serverless generations – Defines first‑generation (container‑centric, centralized) vs. second‑generation (isolate‑based, edge‑distributed) architectures.
  • Architectural deep‑dive – Reconstructs the design of seven real‑world platforms (e.g., AWS Lambda, Cloudflare Workers, Fastly Compute@Edge) using only publicly available documentation and reverse‑engineering.
  • Large‑scale microbenchmark suite – Executes >38 M function calls across all platforms, measuring warm‑request latency, cold‑start time, throughput, and resource isolation overhead.
  • Performance model – Quantifies how isolate size, edge placement, and runtime language affect latency, providing a predictive framework for developers.
  • Open data set – Releases the benchmark scripts and raw results for reproducibility and future research.

Methodology

  1. Platform selection & reconstruction – The authors chose seven widely used serverless services spanning the two generations. Since source code isn’t public, they gathered data from API docs, whitepapers, community blogs, and network traces to infer the execution stack (e.g., V8 isolates, WebAssembly runtimes, sandboxing mechanisms).
  2. Benchmark design – A lightweight “hello‑world” function (returning a static string) and a small CPU‑bound loop were deployed on each platform. The tests varied payload size (0 B–1 KB) and concurrency (1–100 parallel invocations).
  3. Instrumentation – High‑resolution timestamps were collected at the client, platform edge, and within the function (when possible) to separate network latency from platform overhead.
  4. Execution campaign – Over a two‑week period, the authors triggered >38 M calls from multiple geographic regions to capture warm‑start, cold‑start, and steady‑state behavior.
  5. Statistical analysis – Median, 95th‑percentile, and tail‑latency metrics were computed, and ANOVA tests were used to confirm significance between generations.

Results & Findings

Metric (median)First‑Gen (containers)Second‑Gen (isolates)
Warm request latency~40 ms~10 ms
Cold‑start latency200 ms–1 s (depends on image size)< 15 ms (often indistinguishable)
Throughput (req/s per instance)150–300400–700
Memory overhead per function128 MiB (minimum)16–32 MiB (isolate)
Tail‑latency (99th %)120 ms35 ms

What it means: By swapping containers for isolates (e.g., V8 isolates, WebAssembly sandboxes) and moving execution to edge nodes, platforms shave off ~30 ms of latency per request and make cold starts negligible. The trade‑off is a tighter sandbox (no arbitrary OS syscalls, limited filesystem access) and smaller per‑function memory caps.

Practical Implications

  • Latency‑critical APIs – Services like ad‑tech bidding, real‑time personalization, or IoT gateways can now meet sub‑20 ms SLA requirements without complex edge‑caching tricks.
  • Cost optimization – Faster warm execution means fewer provisioned concurrency units or reserved instances, translating into lower bills for high‑QPS workloads.
  • Language choice – Since many second‑gen platforms run JavaScript/TypeScript or WebAssembly, teams may favor these runtimes to exploit the low‑overhead isolates.
  • Edge‑first design patterns – Developers can architect “edge‑only” micro‑services (e.g., request validation, auth token checks) that run directly on the platform’s CDN nodes, reducing round‑trip time to origin servers.
  • Observability adjustments – Traditional container‑level metrics (CPU throttling, cgroup stats) are no longer available; developers need to instrument at the function level or rely on platform‑provided telemetry.

Limitations & Future Work

  • Black‑box reconstruction – Without source access, some architectural details (e.g., exact isolation mechanisms) are inferred and may change in future platform releases.
  • Workload diversity – Benchmarks focus on short‑lived, CPU‑light functions; I/O‑heavy or stateful workloads (e.g., streaming, large file processing) were not evaluated.
  • Security & compliance – The paper notes that tighter isolates limit system‑call exposure, but does not assess the impact on security certifications or sandbox escape risks.
  • Future directions – The authors suggest extending the study to include emerging “function‑as‑a‑service” runtimes that run on unikernels, exploring hybrid models (container + isolate), and measuring energy efficiency at the edge.

Authors

  • Trever Schirmer
  • Aris Wiegand
  • Lucca di Benedetto
  • Linus Gustafsson
  • Natalie Carl
  • Tobias Pfandzelter
  • David Bermbach

Paper Information

  • arXiv ID: 2604.15916v1
  • Categories: cs.DC
  • Published: April 17, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »