[Paper] Understanding and Detecting Scalability Faults in Large-Scale Distributed Systems

Published: 3 days ago (June 10, 2026 at 04:49 AM EDT)

2 min read

Source: arXiv

Source: arXiv - 2606.11815v1

Overview

Scalable distributed systems form the backbone of modern computing infrastructure. However, as scale grows, system complexity may lead to scalability faults. Scalability faults are challenging to uncover and diagnose, as they are often latent and only manifest at large-scale deployment. In this paper, we present the first comprehensive study on scalability faults and propose an approach for their detection. First, we systematically investigate 444 scalability issue reports from 10 large-scale distributed systems to understand the common anti-patterns and root causes of scalability faults. We found that the majority of these faults are caused by the synergy between dimensional code fragments and anti-patterns associated with them. Second, based on our findings, we design and implement ScaleLens, a novel approach to detect scalability faults. ScaleLens combines dynamic and static analyses to pinpoint dimensional code fragments and match them with anti-patterns. Our evaluation shows that ScaleLens detects 4.2x more dimensional code fragments associated with known scalability faults compared to the baseline. On the latest stable versions of Cassandra, HDFS, and Ignite, ScaleLens detects 334 dimensional code fragments with confirmed problematic behavior.

Key Contributions

This paper presents research in the following areas:

cs.SE

Methodology

Please refer to the full paper for detailed methodology.

Practical Implications

This research contributes to the advancement of cs.SE.

Authors

Hao-Nan Zhu
Goodness Ayinmode
Cesar A. Stuardo
Haryadi S. Gunawi
Cindy Rubio-González

Paper Information

arXiv ID: 2606.11815v1
Categories: cs.SE
Published: June 10, 2026
PDF: Download PDF

[Paper] Understanding and Detecting Scalability Faults in Large-Scale Distributed Systems

Overview

Key Contributions

Methodology

Practical Implications

Authors

Paper Information

Related posts

[Paper] The End of Code Review: Coding Agents Supersede Human Inspection

[Paper] DIG: Oracle-Guided Directed Input Generation for One-Day Vulnerabilities

[Paper] The Rise of AI-Native Software Engineering: Implications for Practice, Education, and the Future Workforce

[Paper] Characterizing Tests in IoT Software: Practices, Challenges and Opportunities