[Paper] An Empirical Study of the Evolution of GitHub Actions Workflows

Published: 3 days ago (February 16, 2026 at 04:05 AM EST)

4 min read

Source: arXiv

Source: arXiv - 2602.14572v1

Overview

The paper presents the first large‑scale, data‑driven look at how GitHub Actions workflows evolve over time. By mining millions of workflow file versions from thousands of open‑source projects, the authors uncover patterns in how developers create, modify, and maintain CI/CD pipelines directly inside GitHub.

Key Contributions

Empirical dataset: Collected and analyzed 3.4 M+ versions of GitHub Actions workflow files from 49 K+ repositories (Nov 2019 – Aug 2025).
Conceptual change taxonomy: Identified seven distinct types of workflow modifications (e.g., task configuration, job specification, dependency updates).
Quantitative insights: Measured frequency, size, and timing of workflow changes, showing that 7.3 % of workflow files are touched weekly and most edits are tiny (≈1 change per commit).
Tooling gap analysis: Highlighted a lack of automated support for common maintenance tasks such as dependency management and security hardening.
LLM impact assessment: Investigated whether large‑language‑model coding assistants affect workflow churn and found no clear evidence of a measurable effect.

Methodology

Data collection – The researchers used the GitHub REST API and the GHTorrent dataset to retrieve every .github/workflows/*.yml file across 49 K+ public repositories, spanning more than five years.
Qualitative grounding – A manual inspection of 439 changed workflow files produced a taxonomy of conceptual changes (e.g., “add a new job”, “modify a step’s environment variable”).
Quantitative analysis – For each repository they built a change history: every commit that added, deleted, or edited a workflow file. Metrics such as “changes per week”, “lines added/removed”, and “type of change” were computed.
Statistical testing – The authors applied non‑parametric tests (Mann‑Whitney U, Kruskal‑Wallis) to compare groups (e.g., repositories that use LLM‑generated code vs. those that don’t).
Validity checks – Threats to internal and external validity were discussed, including sampling bias toward popular repositories and the limitations of static file‑diff analysis.

Results & Findings

Observation	What it means
Median of 3 workflow files per repo	Most projects keep their CI/CD surface relatively small and focused.
7.3 % of workflow files change weekly	CI pipelines are actively maintained; they are not “set‑and‑forget”.
~75 % of commits contain a single change	Developers tend to make incremental, fine‑grained updates rather than large refactors.
Dominant change types: task configuration & job specification	The bulk of effort goes into tweaking existing steps (e.g., changing a Docker image tag, adjusting a script command).
Low adoption of dependency‑management actions	Few projects use specialized actions for updating libraries or security patches.
No clear LLM effect	The rise of AI‑assisted coding tools has not yet translated into observable shifts in workflow churn.
Security‑related changes are rare	Few commits address secrets handling, permission scopes, or vulnerability scanning.

Practical Implications

Tool developers can build “workflow diff assistants” that surface the exact semantic change (e.g., “updated Node version from 16 to 18”) rather than raw line diffs, reducing cognitive load for reviewers.
CI/CD platform teams should consider first‑class support for dependency‑update actions (e.g., automated Renovate‑style PRs) because developers are already making frequent, small edits to task configurations.
Security teams can integrate automated checks that flag missing or outdated security actions, encouraging a shift‑left approach to pipeline hardening.
Project maintainers can adopt a “single‑change‑per‑PR” workflow for Actions files, aligning with the observed developer habit and making code‑review easier.
AI‑tool vendors have a clear opportunity: embed domain‑specific knowledge about GitHub Actions (e.g., recommended syntax, best‑practice defaults) to make generated workflow snippets more maintainable.

Limitations & Future Work

Sampling bias: The study focuses on public repositories with at least one workflow file, potentially under‑representing private or enterprise projects where CI/CD practices may differ.
Static analysis only: The authors examined file diffs but did not execute the workflows, so the impact of changes on build success or performance remains unknown.
Temporal horizon: Data stops in August 2025; rapid adoption of new Actions features or AI‑driven tooling after that point could shift patterns.
Future directions: Extending the analysis to include execution logs, exploring causal links between security incidents and workflow changes, and prototyping the suggested tooling (e.g., AI‑augmented reviewers) to evaluate real‑world effectiveness.

Authors

Pooya Rostami Mazrae
Alexandre Decan
Tom Mens
Mairieli Wessel

Paper Information

arXiv ID: 2602.14572v1
Categories: cs.SE
Published: February 16, 2026
PDF: Download PDF

[Paper] An Empirical Study of the Evolution of GitHub Actions Workflows

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Software-heavy Asset Administration Shells: Classification and Use Cases

[Paper] Mind the Gap: Evaluating LLMs for High-Level Malicious Package Detection vs. Fine-Grained Indicator Identification

[Paper] A Calculus of Overlays

[Paper] Algorithm-Based Pipeline for Reliable and Intent-Preserving Code Translation with LLMs