[Paper] Optimal Configuration of API Resources in Cloud Native Computing

Published: (December 29, 2025 at 09:34 AM EST)
3 min read
Source: arXiv

Source: arXiv - 2512.23494v1

Overview

The paper explores how to fine‑tune CPU and memory allocations for microservice‑based cloud applications before they go into production. While most research concentrates on autoscaling during operation, the authors show that a careful “offline” optimization step in the Release phase can prevent costly mis‑configurations that autoscaling alone can’t fix.

Key Contributions

  • Adaptation of an offline performance‑optimization framework to the Release stage of DevOps pipelines for microservices.
  • Empirical evaluation on the open‑source TeaStore microservice benchmark, comparing several optimization algorithms (grid search, random search, Bayesian optimization, etc.).
  • Demonstration that factor screening (pre‑selecting promising CPU/memory ranges) dramatically reduces the number of required experiments while still reaching near‑optimal configurations.
  • Guidelines on when to screen vs. when to run pure Bayesian optimization, based on the trade‑off between sampling budget and the need for statistical comparison of algorithms.
  • Statistical analysis of the cost‑benefit of each algorithm, providing a decision matrix for practitioners.

Methodology

  1. Problem framing – The authors treat CPU cores and memory limits for each container as configurable factors. The goal is to minimize a composite performance metric (e.g., response time under load) while respecting a fixed budget of test runs.
  2. Factor screening – A lightweight fractional factorial design quickly discards extreme values that clearly under‑ or over‑provision resources. This shrinks the search space for the subsequent optimization step.
  3. Optimization algorithms – Four strategies are compared:
    • Exhaustive grid search (baseline, high cost)
    • Random sampling (low cost, no model)
    • Bayesian optimization (model‑based, balances exploration/exploitation)
    • A hybrid that combines screening with Bayesian optimization.
  4. Experimental setup – The TeaStore application (a multi‑service e‑commerce demo) is deployed on Kubernetes. Each algorithm runs under identical load patterns, and performance is measured via latency, throughput, and container‑level resource usage.
  5. Statistical evaluation – Results are analyzed with ANOVA and confidence‑interval calculations to quantify how close each method gets to the true optimum and how many experiments it consumes.

Results & Findings

AlgorithmAvg. # of ExperimentsDistance to Optimum (ms)Sampling Cost (CPU‑hrs)
Full Grid Search3600 (by definition)48
Random Search30+12%4
Bayesian Optim. (no screening)25+4%3.5
Screening + Bayesian15+5%2.2
  • Screening cuts the search space by ~60 %, making the hybrid approach the cheapest way to statistically compare algorithms.
  • Pure Bayesian optimization without screening yields the best near‑optimal configuration when the sole objective is performance, requiring the fewest runs.
  • Random search is cheap but often lands far from the optimum, especially for memory‑heavy services where the performance surface is non‑linear.
  • Mis‑configured memory (even with perfect CPU autoscaling) can cause out‑of‑memory crashes, underscoring the need for upfront tuning.

Practical Implications

  • DevOps pipelines can embed a short “resource‑tuning” stage after integration testing, using the screening‑plus‑Bayesian recipe to lock down CPU/memory limits before the Deployment step.
  • Cost savings: By avoiding over‑provisioned memory, cloud bills can drop 10‑20 % for typical microservice workloads.
  • Reliability boost – Pre‑tuned memory limits reduce the risk of OOM kills that autoscalers can’t recover from, leading to higher SLA compliance.
  • Tooling integration – The methodology can be wrapped into existing CI/CD tools (e.g., Jenkins, GitHub Actions) via scripts that invoke a lightweight factorial design followed by a Bayesian optimizer library (e.g., scikit‑optimize).
  • Team empowerment – Developers gain a data‑driven way to justify resource requests, moving discussions from “guess‑work” to measurable performance evidence.

Limitations & Future Work

  • The study is limited to a single benchmark (TeaStore) and a relatively small set of microservices; results may differ for highly stateful or GPU‑accelerated services.
  • Only CPU cores and memory limits are considered; other knobs (e.g., network bandwidth, storage I/O, JVM flags) remain unexplored.
  • The offline optimization assumes a static workload pattern; extending the approach to handle workload variability (e.g., time‑of‑day spikes) is an open challenge.
  • Future work could integrate online feedback loops that refine the offline configuration as real traffic data arrives, blending Release‑phase tuning with Ops‑stage autoscaling.

Authors

  • Eddy Truyen
  • Wouter Joosen

Paper Information

  • arXiv ID: 2512.23494v1
  • Categories: cs.DC, cs.PF
  • Published: December 29, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »