[Paper] Configuration Defects in Kubernetes
Source: arXiv - 2512.05062v1
Overview
Kubernetes has become the de‑facto platform for deploying containerised applications, but its power comes with a steep learning curve—especially when it comes to writing correct configuration files. This paper presents the first large‑scale empirical study of real‑world Kubernetes configuration defects, analyzing 719 bugs from over 2,200 open‑source scripts. The authors not only categorize the defects but also evaluate existing static analysis tools and release a new linter that uncovers previously unknown, high‑impact bugs.
Key Contributions
- Empirical defect dataset: 719 real configuration defects extracted from 2,260 Kubernetes manifests across popular open‑source projects.
- Defect taxonomy: Identification of 15 distinct defect categories (e.g., missing fields, invalid values, insecure defaults).
- Tool coverage analysis: Assessment of eight publicly available static analysis tools, showing they collectively detect only 8 of the 15 categories.
- New linter implementation: A lightweight, open‑source linter that targets two high‑severity defect categories missed by existing tools.
- Impact validation: The linter discovered 26 novel defects; 19 were already fixed after being reported to maintainers.
- Actionable recommendations: Guidelines for integrating defect detection and automated repair into Kubernetes CI/CD pipelines.
Methodology
- Data collection – The authors mined GitHub repositories that use Kubernetes manifests, extracting 2,260 configuration scripts and their associated issue reports.
- Defect extraction – Through manual triage and issue linking, they isolated 719 unique configuration defects.
- Qualitative coding – Using open‑coding techniques, each defect was classified into one of 15 categories based on root cause and symptom.
- Tool evaluation – Eight static analysis tools (e.g., kube‑val, kube‑score, Polaris) were run against the defect corpus; precision and recall were measured per category.
- Linter development – The authors built a custom linter focused on two high‑impact categories (resource‑quota mis‑configurations and insecure network policies) that existing tools missed.
- Validation – Detected defects were reported to upstream maintainers; responses and subsequent fixes were tracked.
Results & Findings
- Coverage gap: Existing tools collectively detect only ~44 % of the defect categories (8/15).
- Best‑performing tools: For data‑field‑related defects (e.g., missing required keys), tools achieve the highest precision (≈ 92 %) and recall (≈ 78 %).
- High‑severity blind spots: None of the surveyed tools catch the two most dangerous categories—resource‑quota mis‑configurations and overly permissive network policies.
- Linter effectiveness: The new linter flagged 26 previously unknown defects; 19 were confirmed and fixed by project maintainers within weeks.
- Defect distribution: The most common categories involve missing/incorrect fields, while the most costly involve security‑related misconfigurations.
Practical Implications
- CI/CD integration: Teams can plug the open‑source linter into their pipelines to automatically catch the two high‑impact defect types that standard tools miss.
- Tool selection: The study provides a clear map of which static analysis tools excel at which defect categories, helping engineers build a complementary toolchain rather than relying on a single solution.
- Developer education: The taxonomy serves as a checklist for developers writing manifests, highlighting common pitfalls (e.g., forgetting
resources.limits, mis‑specifyingserviceAccountName). - Policy enforcement: Organizations can codify the recommended detection/repair patterns into GitHub Actions or Argo CD hooks to enforce best‑practice configurations across clusters.
- Open‑source contribution: Since the datasets and linter are publicly available, developers can extend the linter to cover additional categories or contribute improvements back to the community.
Limitations & Future Work
- Scope of repositories: The study focuses on publicly available GitHub projects; private or enterprise clusters may exhibit different defect patterns.
- Tool set: Only eight static analysis tools were evaluated; newer or proprietary tools could have different coverage.
- Linter focus: The custom linter targets just two defect categories; expanding it to cover the remaining uncovered categories is an obvious next step.
- Automated repair: While detection is addressed, the paper only sketches repair strategies; future work could explore safe, automated remediation (e.g., PR generation).
- Long‑term impact: The authors plan longitudinal studies to measure whether integrating these tools reduces defect recurrence in active projects.
Authors
- Yue Zhang
- Uchswas Paul
- Marcelo d’Amorim
- Akond Rahman
Paper Information
- arXiv ID: 2512.05062v1
- Categories: cs.SE
- Published: December 4, 2025
- PDF: Download PDF