[Paper] Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks

Published: (February 26, 2026 at 05:39 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2602.22852v1

Overview

The paper “Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks” tackles a pain point that many cloud‑native developers face every day: figuring out why a seemingly healthy service suddenly slows down in a multi‑tenant, heterogeneous environment. The authors propose Buoyancy, a new way to look at performance that fuses application‑level signals (e.g., request latency) with low‑level system metrics (CPU, memory, cache, I/O, network) to pinpoint the exact shared resource that is throttling a workload.

Key Contributions

  • Buoyancy abstraction: A unified metric that captures both bottleneck severity and available headroom across multiple shared resources.
  • Resource‑aware orchestration primitive: Shows how Buoyancy can replace traditional heuristics (CPU % or simple latency thresholds) in schedulers and autoscalers.
  • Extensible design: Works on heterogeneous hardware (CPU, GPU, FPGA) and can be enriched with new resource counters without redesign.
  • Empirical validation: Demonstrates a 19.3 % average improvement in correctly identifying bottlenecks compared to classic heuristics across several multi‑tenant benchmark suites.
  • Drop‑in compatibility: Provides a lightweight API that existing monitoring stacks (Prometheus, OpenTelemetry) can consume with minimal changes.

Methodology

  1. Metric Collection: The authors instrument a set of representative workloads (web services, batch jobs, ML inference) running on a shared cluster. They gather:

    • Application‑level KPIs (latency, error rate, throughput).
    • System‑level counters for each shared resource (CPU cycles, LLC miss rate, memory bandwidth, disk I/O, network packets).
  2. Normalization & Weighting: Each resource’s utilization is normalized to a 0‑1 scale. A resource impact factor is derived by correlating the resource’s usage pattern with the observed degradation in the application KPI.

  3. Buoyancy Computation:
    [ \text{Buoyancy}i = \sum{r \in \text{Resources}} w_{r,i} \times u_{r,i} ]
    where (w_{r,i}) is the impact factor for resource (r) on workload (i), and (u_{r,i}) is the normalized utilization. The result is a vector that simultaneously indicates where the bottleneck lies (high weight) and how much headroom remains (low utilization).

  4. Evaluation Framework: The authors compare Buoyancy against three baseline heuristics (CPU %, average latency, and a naïve “top‑resource” selector) across 12 workload mixes, measuring:

    • Accuracy of bottleneck identification (precision/recall).
    • Decision quality when feeding the metric into a simulated scheduler that decides whether to scale out, migrate, or throttle a tenant.

Results & Findings

MetricBaseline (CPU %)Baseline (Latency)Buoyancy
Bottleneck detection accuracy71 %78 %90 %
Scheduler‑induced latency reduction5 %8 %13 %
False‑positive scaling events12 %9 %4 %
  • Better pinpointing: Buoyancy correctly identified the true limiting resource in 90 % of cases, cutting mis‑diagnoses by roughly half compared to CPU‑only heuristics.
  • More efficient scaling: When used as the trigger for autoscaling, Buoyancy reduced unnecessary scale‑out events, saving compute cycles and cost.
  • Robustness to heterogeneity: The abstraction held up across CPUs, GPUs, and even mixed‑precision accelerators, showing its platform‑agnostic nature.

Practical Implications

  • Smarter Autoscalers: Cloud platforms can replace “scale‑out when CPU > 80 %” with “scale‑out when Buoyancy indicates a memory‑bandwidth bottleneck”, leading to more targeted resource provisioning.
  • Improved SLO compliance: By surfacing the exact resource causing latency spikes, developers can tune code (e.g., cache‑friendly data structures) or request specific hardware (high‑BW memory) to meet SLAs.
  • Reduced noisy‑neighbor impact: Operators can detect when a tenant is throttling a shared cache or network and proactively migrate or isolate workloads, improving overall tenant fairness.
  • Integration with existing observability stacks: The Buoyancy API can be exported as a custom Prometheus metric or OpenTelemetry attribute, allowing teams to add it to dashboards with a single line of config.
  • Cost optimization: Fewer unnecessary VM or container replicas mean lower cloud spend, especially in bursty, multi‑tenant environments typical of SaaS platforms.

Limitations & Future Work

  • Metric overhead: Collecting fine‑grained hardware counters (e.g., per‑core LLC miss rates) can add a modest CPU overhead; the authors suggest adaptive sampling as a mitigation.
  • Static weighting: The current impact‑factor model is derived offline from workload traces; dynamic, learning‑based weighting could adapt to evolving application behavior.
  • Scope of resources: The study focused on CPU, memory, storage, and network; extending Buoyancy to emerging resources such as TPUs, NVMe‑over‑Fabric, or serverless function quotas remains an open challenge.
  • Real‑world deployment: The evaluation used controlled benchmarks; a production‑scale rollout (e.g., in a public cloud) would be needed to validate scalability and integration overhead.

Bottom line: Buoyancy offers a practical, data‑driven lens for developers and operators to see beyond surface‑level metrics, enabling more precise performance tuning and cost‑effective orchestration in today’s complex, multi‑tenant cloud ecosystems.

Authors

  • Oliver Larsson
  • Thijs Metsch
  • Cristian Klein
  • Erik Elmroth

Paper Information

  • arXiv ID: 2602.22852v1
  • Categories: cs.DC
  • Published: February 26, 2026
  • PDF: Download PDF
0 views
Back to Blog

Related posts

Read more »