[Paper] Scrum Sprint Planning: LLM-based and algorithmic solutions

Published: 1 week ago (December 21, 2025 at 09:26 PM EST)

3 min read

Source: arXiv

Source: arXiv - 2512.18966v1

Overview

The authors investigate whether Large Language Models (LLMs) such as OpenAI’s GPT‑3.5 Turbo, GPT‑4 Turbo, and the newer “Val” can automate or at least assist Scrum sprint‑planning—a core activity for agile teams. By feeding manually crafted sprint data into these models, they evaluate the quality of the generated sprint backlogs and task allocations, aiming to see if LLMs could become a practical aid for product owners and Scrum masters.

Key Contributions

Empirical case‑study of three state‑of‑the‑art OpenAI models applied to sprint‑planning scenarios.
Dataset construction: a set of manually curated user stories, acceptance criteria, and capacity constraints used as test inputs.
Evaluation framework: qualitative criteria (clarity, completeness, adherence to Scrum rules) and quantitative metrics (story point distribution, dependency handling).
Finding that current LLM outputs fall short of the quality required for direct adoption in real Scrum projects.

Methodology

Data Preparation – The team created several realistic sprint scenarios, each containing a product backlog, team velocity, and resource constraints.
Prompt Engineering – For each model they designed prompts that asked the LLM to:
- Prioritize backlog items,
- Estimate story points, and
- Produce a sprint backlog that respects the given capacity.
Model Execution – The three OpenAI models were queried via their API, using identical prompts and temperature settings to keep the comparison fair.
Assessment – Outputs were reviewed by Scrum practitioners who scored them on:
- Correctness (do the selected items fit the capacity?),
- Completeness (are acceptance criteria preserved?), and
- Scrum compliance (e.g., no “half‑done” stories, proper definition of done).

Results & Findings

GPT‑4 Turbo produced the most coherent lists but still missed several capacity constraints and occasionally generated duplicate or contradictory stories.
GPT‑3.5 Turbo showed higher variance; some runs were usable after manual tweaking, while others were nonsensical.
Val (the newest model) performed similarly to GPT‑4 on surface fluency but struggled with the logical consistency required for sprint planning.
Across all models, story‑point estimation was inconsistent, and dependency handling (ensuring prerequisite tasks appear earlier) was unreliable.
The authors conclude that, in their current form, LLMs cannot replace human sprint‑planning but may serve as a drafting aid.

Practical Implications

Assistive Drafting: Teams could use LLMs to generate an initial sprint backlog that a Scrum master then refines, potentially saving time on routine prioritization.
Training & Onboarding: New team members could query an LLM to see example sprint plans, helping them understand Scrum conventions faster.
Prompt‑Design Research: The study highlights the need for more sophisticated prompting or fine‑tuning on agile‑specific corpora before LLMs become production‑ready for Scrum tasks.
Tool Integration: Agile tooling vendors might embed LLM APIs as “suggestion engines” rather than autonomous planners, offering suggestions that are clearly marked as provisional.

Limitations & Future Work

Synthetic Data: The experiments used manually created datasets rather than live project data, which may not capture the full complexity of real‑world backlogs.
Evaluation Scope: The assessment relied heavily on expert judgment; more objective metrics (e.g., sprint velocity variance) could strengthen conclusions.
Model Fine‑Tuning: The authors plan to explore domain‑specific fine‑tuning or retrieval‑augmented generation to improve logical consistency.
Human‑in‑the‑Loop Studies: Future work will involve actual Scrum teams using LLM‑generated drafts in live sprints to measure productivity impact.

Authors

Yuwon Yoon
Kevin Iwan
Madeleine Zwart
Xiaohan Qin
Hina Lee
Maria Spichkova

Paper Information

arXiv ID: 2512.18966v1
Categories: cs.SE
Published: December 22, 2025
PDF: Download PDF

[Paper] Scrum Sprint Planning: LLM-based and algorithmic solutions

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] HALF: Process Hollowing Analysis Framework for Binary Programs with the Assistance of Kernel Modules

[Paper] Analyzing Code Injection Attacks on LLM-based Multi-Agent Systems in Software Development

[Paper] A Story About Cohesion and Separation: Label-Free Metric for Log Parser Evaluation

[Paper] The State of the SBOM Tool Ecosystems: A Comparative Analysis of SPDX and CycloneDX