[Paper] Scrum Sprint Planning: LLM-based and algorithmic solutions
Source: arXiv - 2512.18966v1
Overview
The authors investigate whether Large Language Models (LLMs) such as OpenAI’s GPT‑3.5 Turbo, GPT‑4 Turbo, and the newer “Val” can automate or at least assist Scrum sprint‑planning—a core activity for agile teams. By feeding manually crafted sprint data into these models, they evaluate the quality of the generated sprint backlogs and task allocations, aiming to see if LLMs could become a practical aid for product owners and Scrum masters.
Key Contributions
- Empirical case‑study of three state‑of‑the‑art OpenAI models applied to sprint‑planning scenarios.
- Dataset construction: a set of manually curated user stories, acceptance criteria, and capacity constraints used as test inputs.
- Evaluation framework: qualitative criteria (clarity, completeness, adherence to Scrum rules) and quantitative metrics (story point distribution, dependency handling).
- Finding that current LLM outputs fall short of the quality required for direct adoption in real Scrum projects.
Methodology
- Data Preparation – The team created several realistic sprint scenarios, each containing a product backlog, team velocity, and resource constraints.
- Prompt Engineering – For each model they designed prompts that asked the LLM to:
- Prioritize backlog items,
- Estimate story points, and
- Produce a sprint backlog that respects the given capacity.
- Model Execution – The three OpenAI models were queried via their API, using identical prompts and temperature settings to keep the comparison fair.
- Assessment – Outputs were reviewed by Scrum practitioners who scored them on:
- Correctness (do the selected items fit the capacity?),
- Completeness (are acceptance criteria preserved?), and
- Scrum compliance (e.g., no “half‑done” stories, proper definition of done).
Results & Findings
- GPT‑4 Turbo produced the most coherent lists but still missed several capacity constraints and occasionally generated duplicate or contradictory stories.
- GPT‑3.5 Turbo showed higher variance; some runs were usable after manual tweaking, while others were nonsensical.
- Val (the newest model) performed similarly to GPT‑4 on surface fluency but struggled with the logical consistency required for sprint planning.
- Across all models, story‑point estimation was inconsistent, and dependency handling (ensuring prerequisite tasks appear earlier) was unreliable.
- The authors conclude that, in their current form, LLMs cannot replace human sprint‑planning but may serve as a drafting aid.
Practical Implications
- Assistive Drafting: Teams could use LLMs to generate an initial sprint backlog that a Scrum master then refines, potentially saving time on routine prioritization.
- Training & Onboarding: New team members could query an LLM to see example sprint plans, helping them understand Scrum conventions faster.
- Prompt‑Design Research: The study highlights the need for more sophisticated prompting or fine‑tuning on agile‑specific corpora before LLMs become production‑ready for Scrum tasks.
- Tool Integration: Agile tooling vendors might embed LLM APIs as “suggestion engines” rather than autonomous planners, offering suggestions that are clearly marked as provisional.
Limitations & Future Work
- Synthetic Data: The experiments used manually created datasets rather than live project data, which may not capture the full complexity of real‑world backlogs.
- Evaluation Scope: The assessment relied heavily on expert judgment; more objective metrics (e.g., sprint velocity variance) could strengthen conclusions.
- Model Fine‑Tuning: The authors plan to explore domain‑specific fine‑tuning or retrieval‑augmented generation to improve logical consistency.
- Human‑in‑the‑Loop Studies: Future work will involve actual Scrum teams using LLM‑generated drafts in live sprints to measure productivity impact.
Authors
- Yuwon Yoon
- Kevin Iwan
- Madeleine Zwart
- Xiaohan Qin
- Hina Lee
- Maria Spichkova
Paper Information
- arXiv ID: 2512.18966v1
- Categories: cs.SE
- Published: December 22, 2025
- PDF: Download PDF