[Paper] Configuration Defects in Kubernetes

Published: 1 month ago (December 4, 2025 at 01:16 PM EST)

3 min read

Source: arXiv

Source: arXiv - 2512.05062v1

Overview

Kubernetes has become the de‑facto platform for deploying containerised applications, but its power comes with a steep learning curve—especially when it comes to writing correct configuration files. This paper presents the first large‑scale empirical study of real‑world Kubernetes configuration defects, analyzing 719 bugs from over 2,200 open‑source scripts. The authors not only categorize the defects but also evaluate existing static analysis tools and release a new linter that uncovers previously unknown, high‑impact bugs.

Key Contributions

Empirical defect dataset: 719 real configuration defects extracted from 2,260 Kubernetes manifests across popular open‑source projects.
Defect taxonomy: Identification of 15 distinct defect categories (e.g., missing fields, invalid values, insecure defaults).
Tool coverage analysis: Assessment of eight publicly available static analysis tools, showing they collectively detect only 8 of the 15 categories.
New linter implementation: A lightweight, open‑source linter that targets two high‑severity defect categories missed by existing tools.
Impact validation: The linter discovered 26 novel defects; 19 were already fixed after being reported to maintainers.
Actionable recommendations: Guidelines for integrating defect detection and automated repair into Kubernetes CI/CD pipelines.

Methodology

Data collection – The authors mined GitHub repositories that use Kubernetes manifests, extracting 2,260 configuration scripts and their associated issue reports.
Defect extraction – Through manual triage and issue linking, they isolated 719 unique configuration defects.
Qualitative coding – Using open‑coding techniques, each defect was classified into one of 15 categories based on root cause and symptom.
Tool evaluation – Eight static analysis tools (e.g., kube‑val, kube‑score, Polaris) were run against the defect corpus; precision and recall were measured per category.
Linter development – The authors built a custom linter focused on two high‑impact categories (resource‑quota mis‑configurations and insecure network policies) that existing tools missed.
Validation – Detected defects were reported to upstream maintainers; responses and subsequent fixes were tracked.

Results & Findings

Coverage gap: Existing tools collectively detect only ~44 % of the defect categories (8/15).
Best‑performing tools: For data‑field‑related defects (e.g., missing required keys), tools achieve the highest precision (≈ 92 %) and recall (≈ 78 %).
High‑severity blind spots: None of the surveyed tools catch the two most dangerous categories—resource‑quota mis‑configurations and overly permissive network policies.
Linter effectiveness: The new linter flagged 26 previously unknown defects; 19 were confirmed and fixed by project maintainers within weeks.
Defect distribution: The most common categories involve missing/incorrect fields, while the most costly involve security‑related misconfigurations.

Practical Implications

CI/CD integration: Teams can plug the open‑source linter into their pipelines to automatically catch the two high‑impact defect types that standard tools miss.
Tool selection: The study provides a clear map of which static analysis tools excel at which defect categories, helping engineers build a complementary toolchain rather than relying on a single solution.
Developer education: The taxonomy serves as a checklist for developers writing manifests, highlighting common pitfalls (e.g., forgetting resources.limits, mis‑specifying serviceAccountName).
Policy enforcement: Organizations can codify the recommended detection/repair patterns into GitHub Actions or Argo CD hooks to enforce best‑practice configurations across clusters.
Open‑source contribution: Since the datasets and linter are publicly available, developers can extend the linter to cover additional categories or contribute improvements back to the community.

Limitations & Future Work

Scope of repositories: The study focuses on publicly available GitHub projects; private or enterprise clusters may exhibit different defect patterns.
Tool set: Only eight static analysis tools were evaluated; newer or proprietary tools could have different coverage.
Linter focus: The custom linter targets just two defect categories; expanding it to cover the remaining uncovered categories is an obvious next step.
Automated repair: While detection is addressed, the paper only sketches repair strategies; future work could explore safe, automated remediation (e.g., PR generation).
Long‑term impact: The authors plan longitudinal studies to measure whether integrating these tools reduces defect recurrence in active projects.

Authors

Yue Zhang
Uchswas Paul
Marcelo d’Amorim
Akond Rahman

Paper Information

arXiv ID: 2512.05062v1
Categories: cs.SE
Published: December 4, 2025
PDF: Download PDF

[Paper] Configuration Defects in Kubernetes

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] MicroRacer: Detecting Concurrency Bugs for Cloud Service Systems

[Paper] Executing Discrete/Continuous Declarative Process Specifications via Complex Event Processing

[Paper] Compiling Away the Overhead of Race Detection

[Paper] Automated Code Review Assignments: An Alternative Perspective of Code Ownership on GitHub