[Paper] An Empirical Study of the Evolution of GitHub Actions Workflows
Source: arXiv - 2602.14572v1
Overview
The paper presents the first large‑scale, data‑driven look at how GitHub Actions workflows evolve over time. By mining millions of workflow file versions from thousands of open‑source projects, the authors uncover patterns in how developers create, modify, and maintain CI/CD pipelines directly inside GitHub.
Key Contributions
- Empirical dataset: Collected and analyzed 3.4 M+ versions of GitHub Actions workflow files from 49 K+ repositories (Nov 2019 – Aug 2025).
- Conceptual change taxonomy: Identified seven distinct types of workflow modifications (e.g., task configuration, job specification, dependency updates).
- Quantitative insights: Measured frequency, size, and timing of workflow changes, showing that 7.3 % of workflow files are touched weekly and most edits are tiny (≈1 change per commit).
- Tooling gap analysis: Highlighted a lack of automated support for common maintenance tasks such as dependency management and security hardening.
- LLM impact assessment: Investigated whether large‑language‑model coding assistants affect workflow churn and found no clear evidence of a measurable effect.
Methodology
- Data collection – The researchers used the GitHub REST API and the GHTorrent dataset to retrieve every
.github/workflows/*.ymlfile across 49 K+ public repositories, spanning more than five years. - Qualitative grounding – A manual inspection of 439 changed workflow files produced a taxonomy of conceptual changes (e.g., “add a new job”, “modify a step’s environment variable”).
- Quantitative analysis – For each repository they built a change history: every commit that added, deleted, or edited a workflow file. Metrics such as “changes per week”, “lines added/removed”, and “type of change” were computed.
- Statistical testing – The authors applied non‑parametric tests (Mann‑Whitney U, Kruskal‑Wallis) to compare groups (e.g., repositories that use LLM‑generated code vs. those that don’t).
- Validity checks – Threats to internal and external validity were discussed, including sampling bias toward popular repositories and the limitations of static file‑diff analysis.
Results & Findings
| Observation | What it means |
|---|---|
| Median of 3 workflow files per repo | Most projects keep their CI/CD surface relatively small and focused. |
| 7.3 % of workflow files change weekly | CI pipelines are actively maintained; they are not “set‑and‑forget”. |
| ~75 % of commits contain a single change | Developers tend to make incremental, fine‑grained updates rather than large refactors. |
| Dominant change types: task configuration & job specification | The bulk of effort goes into tweaking existing steps (e.g., changing a Docker image tag, adjusting a script command). |
| Low adoption of dependency‑management actions | Few projects use specialized actions for updating libraries or security patches. |
| No clear LLM effect | The rise of AI‑assisted coding tools has not yet translated into observable shifts in workflow churn. |
| Security‑related changes are rare | Few commits address secrets handling, permission scopes, or vulnerability scanning. |
Practical Implications
- Tool developers can build “workflow diff assistants” that surface the exact semantic change (e.g., “updated Node version from 16 to 18”) rather than raw line diffs, reducing cognitive load for reviewers.
- CI/CD platform teams should consider first‑class support for dependency‑update actions (e.g., automated Renovate‑style PRs) because developers are already making frequent, small edits to task configurations.
- Security teams can integrate automated checks that flag missing or outdated security actions, encouraging a shift‑left approach to pipeline hardening.
- Project maintainers can adopt a “single‑change‑per‑PR” workflow for Actions files, aligning with the observed developer habit and making code‑review easier.
- AI‑tool vendors have a clear opportunity: embed domain‑specific knowledge about GitHub Actions (e.g., recommended syntax, best‑practice defaults) to make generated workflow snippets more maintainable.
Limitations & Future Work
- Sampling bias: The study focuses on public repositories with at least one workflow file, potentially under‑representing private or enterprise projects where CI/CD practices may differ.
- Static analysis only: The authors examined file diffs but did not execute the workflows, so the impact of changes on build success or performance remains unknown.
- Temporal horizon: Data stops in August 2025; rapid adoption of new Actions features or AI‑driven tooling after that point could shift patterns.
- Future directions: Extending the analysis to include execution logs, exploring causal links between security incidents and workflow changes, and prototyping the suggested tooling (e.g., AI‑augmented reviewers) to evaluate real‑world effectiveness.
Authors
- Pooya Rostami Mazrae
- Alexandre Decan
- Tom Mens
- Mairieli Wessel
Paper Information
- arXiv ID: 2602.14572v1
- Categories: cs.SE
- Published: February 16, 2026
- PDF: Download PDF