[Paper] Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks

Published: 3 days ago (February 26, 2026 at 05:39 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.22852v1

Overview

The paper “Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks” tackles a pain point that many cloud‑native developers face every day: figuring out why a seemingly healthy service suddenly slows down in a multi‑tenant, heterogeneous environment. The authors propose Buoyancy, a new way to look at performance that fuses application‑level signals (e.g., request latency) with low‑level system metrics (CPU, memory, cache, I/O, network) to pinpoint the exact shared resource that is throttling a workload.

Key Contributions

Buoyancy abstraction: A unified metric that captures both bottleneck severity and available headroom across multiple shared resources.
Resource‑aware orchestration primitive: Shows how Buoyancy can replace traditional heuristics (CPU % or simple latency thresholds) in schedulers and autoscalers.
Extensible design: Works on heterogeneous hardware (CPU, GPU, FPGA) and can be enriched with new resource counters without redesign.
Empirical validation: Demonstrates a 19.3 % average improvement in correctly identifying bottlenecks compared to classic heuristics across several multi‑tenant benchmark suites.
Drop‑in compatibility: Provides a lightweight API that existing monitoring stacks (Prometheus, OpenTelemetry) can consume with minimal changes.

Methodology

Metric Collection: The authors instrument a set of representative workloads (web services, batch jobs, ML inference) running on a shared cluster. They gather:
- Application‑level KPIs (latency, error rate, throughput).
- System‑level counters for each shared resource (CPU cycles, LLC miss rate, memory bandwidth, disk I/O, network packets).
Normalization & Weighting: Each resource’s utilization is normalized to a 0‑1 scale. A resource impact factor is derived by correlating the resource’s usage pattern with the observed degradation in the application KPI.
Buoyancy Computation:
[ \text{Buoyancy}i = \sum{r \in \text{Resources}} w_{r,i} \times u_{r,i} ]
where (w_{r,i}) is the impact factor for resource (r) on workload (i), and (u_{r,i}) is the normalized utilization. The result is a vector that simultaneously indicates where the bottleneck lies (high weight) and how much headroom remains (low utilization).
Evaluation Framework: The authors compare Buoyancy against three baseline heuristics (CPU %, average latency, and a naïve “top‑resource” selector) across 12 workload mixes, measuring:
- Accuracy of bottleneck identification (precision/recall).
- Decision quality when feeding the metric into a simulated scheduler that decides whether to scale out, migrate, or throttle a tenant.

Results & Findings

Metric	Baseline (CPU %)	Baseline (Latency)	Buoyancy
Bottleneck detection accuracy	71 %	78 %	90 %
Scheduler‑induced latency reduction	5 %	8 %	13 %
False‑positive scaling events	12 %	9 %	4 %

Better pinpointing: Buoyancy correctly identified the true limiting resource in 90 % of cases, cutting mis‑diagnoses by roughly half compared to CPU‑only heuristics.
More efficient scaling: When used as the trigger for autoscaling, Buoyancy reduced unnecessary scale‑out events, saving compute cycles and cost.
Robustness to heterogeneity: The abstraction held up across CPUs, GPUs, and even mixed‑precision accelerators, showing its platform‑agnostic nature.

Practical Implications

Smarter Autoscalers: Cloud platforms can replace “scale‑out when CPU > 80 %” with “scale‑out when Buoyancy indicates a memory‑bandwidth bottleneck”, leading to more targeted resource provisioning.
Improved SLO compliance: By surfacing the exact resource causing latency spikes, developers can tune code (e.g., cache‑friendly data structures) or request specific hardware (high‑BW memory) to meet SLAs.
Reduced noisy‑neighbor impact: Operators can detect when a tenant is throttling a shared cache or network and proactively migrate or isolate workloads, improving overall tenant fairness.
Integration with existing observability stacks: The Buoyancy API can be exported as a custom Prometheus metric or OpenTelemetry attribute, allowing teams to add it to dashboards with a single line of config.
Cost optimization: Fewer unnecessary VM or container replicas mean lower cloud spend, especially in bursty, multi‑tenant environments typical of SaaS platforms.

Limitations & Future Work

Metric overhead: Collecting fine‑grained hardware counters (e.g., per‑core LLC miss rates) can add a modest CPU overhead; the authors suggest adaptive sampling as a mitigation.
Static weighting: The current impact‑factor model is derived offline from workload traces; dynamic, learning‑based weighting could adapt to evolving application behavior.
Scope of resources: The study focused on CPU, memory, storage, and network; extending Buoyancy to emerging resources such as TPUs, NVMe‑over‑Fabric, or serverless function quotas remains an open challenge.
Real‑world deployment: The evaluation used controlled benchmarks; a production‑scale rollout (e.g., in a public cloud) would be needed to validate scalability and integration overhead.

Bottom line: Buoyancy offers a practical, data‑driven lens for developers and operators to see beyond surface‑level metrics, enabling more precise performance tuning and cost‑effective orchestration in today’s complex, multi‑tenant cloud ecosystems.

Authors

Oliver Larsson
Thijs Metsch
Cristian Klein
Erik Elmroth

Paper Information

arXiv ID: 2602.22852v1
Categories: cs.DC
Published: February 26, 2026
PDF: Download PDF

[Paper] Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Exploiting network topology in brain-scale simulations of spiking neural networks

[Paper] STELLAR: Storage Tuning Engine Leveraging LLM Autonomous Reasoning for High Performance Parallel File Systems

[Paper] A High-Throughput AES-GCM Implementation on GPUs for Secure, Policy-Based Access to Massive Astronomical Catalogs

[Paper] A Simple Distributed Deterministic Planar Separator