[Paper] Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks
Source: arXiv - 2602.22852v1
Overview
The paper “Workload Buoyancy: Keeping Apps Afloat by Identifying Shared Resource Bottlenecks” tackles a pain point that many cloud‑native developers face every day: figuring out why a seemingly healthy service suddenly slows down in a multi‑tenant, heterogeneous environment. The authors propose Buoyancy, a new way to look at performance that fuses application‑level signals (e.g., request latency) with low‑level system metrics (CPU, memory, cache, I/O, network) to pinpoint the exact shared resource that is throttling a workload.
Key Contributions
- Buoyancy abstraction: A unified metric that captures both bottleneck severity and available headroom across multiple shared resources.
- Resource‑aware orchestration primitive: Shows how Buoyancy can replace traditional heuristics (CPU % or simple latency thresholds) in schedulers and autoscalers.
- Extensible design: Works on heterogeneous hardware (CPU, GPU, FPGA) and can be enriched with new resource counters without redesign.
- Empirical validation: Demonstrates a 19.3 % average improvement in correctly identifying bottlenecks compared to classic heuristics across several multi‑tenant benchmark suites.
- Drop‑in compatibility: Provides a lightweight API that existing monitoring stacks (Prometheus, OpenTelemetry) can consume with minimal changes.
Methodology
-
Metric Collection: The authors instrument a set of representative workloads (web services, batch jobs, ML inference) running on a shared cluster. They gather:
- Application‑level KPIs (latency, error rate, throughput).
- System‑level counters for each shared resource (CPU cycles, LLC miss rate, memory bandwidth, disk I/O, network packets).
-
Normalization & Weighting: Each resource’s utilization is normalized to a 0‑1 scale. A resource impact factor is derived by correlating the resource’s usage pattern with the observed degradation in the application KPI.
-
Buoyancy Computation:
[ \text{Buoyancy}i = \sum{r \in \text{Resources}} w_{r,i} \times u_{r,i} ]
where (w_{r,i}) is the impact factor for resource (r) on workload (i), and (u_{r,i}) is the normalized utilization. The result is a vector that simultaneously indicates where the bottleneck lies (high weight) and how much headroom remains (low utilization). -
Evaluation Framework: The authors compare Buoyancy against three baseline heuristics (CPU %, average latency, and a naïve “top‑resource” selector) across 12 workload mixes, measuring:
- Accuracy of bottleneck identification (precision/recall).
- Decision quality when feeding the metric into a simulated scheduler that decides whether to scale out, migrate, or throttle a tenant.
Results & Findings
| Metric | Baseline (CPU %) | Baseline (Latency) | Buoyancy |
|---|---|---|---|
| Bottleneck detection accuracy | 71 % | 78 % | 90 % |
| Scheduler‑induced latency reduction | 5 % | 8 % | 13 % |
| False‑positive scaling events | 12 % | 9 % | 4 % |
- Better pinpointing: Buoyancy correctly identified the true limiting resource in 90 % of cases, cutting mis‑diagnoses by roughly half compared to CPU‑only heuristics.
- More efficient scaling: When used as the trigger for autoscaling, Buoyancy reduced unnecessary scale‑out events, saving compute cycles and cost.
- Robustness to heterogeneity: The abstraction held up across CPUs, GPUs, and even mixed‑precision accelerators, showing its platform‑agnostic nature.
Practical Implications
- Smarter Autoscalers: Cloud platforms can replace “scale‑out when CPU > 80 %” with “scale‑out when Buoyancy indicates a memory‑bandwidth bottleneck”, leading to more targeted resource provisioning.
- Improved SLO compliance: By surfacing the exact resource causing latency spikes, developers can tune code (e.g., cache‑friendly data structures) or request specific hardware (high‑BW memory) to meet SLAs.
- Reduced noisy‑neighbor impact: Operators can detect when a tenant is throttling a shared cache or network and proactively migrate or isolate workloads, improving overall tenant fairness.
- Integration with existing observability stacks: The Buoyancy API can be exported as a custom Prometheus metric or OpenTelemetry attribute, allowing teams to add it to dashboards with a single line of config.
- Cost optimization: Fewer unnecessary VM or container replicas mean lower cloud spend, especially in bursty, multi‑tenant environments typical of SaaS platforms.
Limitations & Future Work
- Metric overhead: Collecting fine‑grained hardware counters (e.g., per‑core LLC miss rates) can add a modest CPU overhead; the authors suggest adaptive sampling as a mitigation.
- Static weighting: The current impact‑factor model is derived offline from workload traces; dynamic, learning‑based weighting could adapt to evolving application behavior.
- Scope of resources: The study focused on CPU, memory, storage, and network; extending Buoyancy to emerging resources such as TPUs, NVMe‑over‑Fabric, or serverless function quotas remains an open challenge.
- Real‑world deployment: The evaluation used controlled benchmarks; a production‑scale rollout (e.g., in a public cloud) would be needed to validate scalability and integration overhead.
Bottom line: Buoyancy offers a practical, data‑driven lens for developers and operators to see beyond surface‑level metrics, enabling more precise performance tuning and cost‑effective orchestration in today’s complex, multi‑tenant cloud ecosystems.
Authors
- Oliver Larsson
- Thijs Metsch
- Cristian Klein
- Erik Elmroth
Paper Information
- arXiv ID: 2602.22852v1
- Categories: cs.DC
- Published: February 26, 2026
- PDF: Download PDF