[Paper] Amplifiers or Equalizers? A Longitudinal Study of LLM Evolution in Software Engineering Project-Based Learning

Published: (November 28, 2025 at 08:05 AM EST)
4 min read
Source: arXiv

Source: arXiv - 2511.23157v1

Overview

The paper presents a two‑year longitudinal investigation of how large language models (LLMs) influence student outcomes in software‑engineering project‑based learning (PBL). By comparing a 2024 cohort that used early, free‑tier LLMs with a 2025 cohort that had access to the latest, paid‑for LLMs, the authors uncover a paradox: modern LLMs can both level the playing field for weaker programmers and widen the gap between high‑ and low‑performers.

Key Contributions

  • Empirical comparison across two academic years (48 students in 2024 vs. 46 in 2025) that isolates the effect of LLM capability upgrades.
  • Dual‑role framework: introduces the concepts of equalizers (raising baseline performance) and amplifiers (exacerbating performance variance).
  • Rich mixed‑methods data: combines quantitative grades, code‑quality metrics, and qualitative student reflections to triangulate findings.
  • Pedagogical recommendations for SE educators on how to harness LLMs while mitigating equity concerns.
  • Open dataset and analysis scripts released for reproducibility and further research.

Methodology

  1. Course Design – Both years ran the same semester‑long SE PBL course (requirements gathering, design, implementation, testing, and delivery).
  2. LLM Access – 2024 students used free‑tier models (e.g., GPT‑3.5‑turbo with usage caps). 2025 students received institutional licenses for the latest paid models (e.g., GPT‑4‑Turbo, Claude‑3).
  3. Data Collection
    • Performance: final project grades, automated code‑quality scores (cyclomatic complexity, test coverage).
    • LLM Interaction: logged API calls, prompt types, and token usage.
    • Surveys & Interviews: post‑project questionnaires and semi‑structured interviews probing students’ perceived help, confidence, and learning strategies.
  4. Analysis – Mixed‑effects regression models control for prior GPA and programming experience; thematic coding extracts patterns from qualitative responses.

Results & Findings

  • Average performance boost: The 2025 cohort’s mean project grade rose by 12 % relative to 2024, with a statistically significant reduction in failure rates among students who scored low on pre‑course programming assessments.
  • Widened variance: The standard deviation of grades increased by 18 %, indicating that top‑performing students benefited disproportionately—some achieving near‑perfect scores.
  • Code quality: Automated metrics showed a 15 % improvement in test coverage and a 10 % reduction in cyclomatic complexity for the 2025 cohort, suggesting more disciplined coding practices.
  • Student perception:
    • Equalizer sentiment was strongest among novices who reported “LLMs helped me get past syntax roadblocks.”
    • Amplifier sentiment emerged from high‑achievers who used LLMs for advanced design suggestions, “I could iterate on architecture faster than peers.”
  • LLM usage patterns: High‑performers made more API calls and crafted more detailed prompts, while weaker students relied on short, “debug‑my‑code” queries.

Practical Implications

  • Tooling for developers: The study validates that integrating powerful LLM assistants into real‑world SE workflows can raise baseline productivity, especially for routine coding and debugging tasks.
  • Team dynamics: In mixed‑skill teams, LLMs may reduce bottlenecks caused by junior members, but managers should monitor that senior members don’t monopolize the “LLM advantage,” which could deepen skill gaps.
  • Curriculum design: Educators (and corporate training programs) can deliberately embed LLM‑augmented assignments to democratize access to advanced SE practices, while also designing counter‑measures (e.g., reflection logs, prompt‑engineering workshops) to ensure learning isn’t outsourced entirely.
  • Product development: Vendors of LLM‑powered IDE plugins can target “equalizer” features—guided scaffolding, error explanation, and test generation—to support less‑experienced developers, while offering “amplifier” capabilities (architecture suggestion, design pattern synthesis) for power users.
  • Policy & licensing: Institutions need to consider cost‑benefit trade‑offs of providing paid LLM access; the paper shows tangible educational gains that may justify institutional subscriptions.

Limitations & Future Work

  • Single‑institution scope: Results stem from one university’s SE course; external validity across different curricula, cultures, or industry settings remains untested.
  • Short‑term focus: The study measures immediate project outcomes; long‑term retention of SE concepts and ability to code without LLM assistance were not evaluated.
  • Prompt quality confound: Differences in how students formulate prompts may drive part of the amplification effect; future work could control for prompt‑engineering skill.
  • Ethical considerations: The authors note the need for deeper investigation into plagiarism detection and intellectual‑property implications when LLMs generate substantial code.

Overall, the paper offers a nuanced view of LLMs as both a democratizing force and a performance magnifier in software‑engineering education—a duality that mirrors the challenges developers will face as these models become standard collaborators in the industry.

Authors

  • Hana Kataoka
  • Jialong Li
  • Yutaka Matsuno

Paper Information

  • arXiv ID: 2511.23157v1
  • Categories: cs.SE, cs.HC
  • Published: November 28, 2025
  • PDF: Download PDF
Back to Blog

Related posts

Read more »