[Paper] How Software Engineering Students Use LLMs to Write Research Papers: An Experience Report
Source: arXiv - 2606.05114v1
Overview
A recent experience report investigates how third‑year software‑engineering students employ large language models (LLMs) when writing short research papers for an empirical methods assignment. By requiring students to disclose their LLM usage, the authors were able to capture real‑world practices, challenges, and learning outcomes, offering a rare glimpse into AI‑augmented academic work in a software‑engineering curriculum.
Key Contributions
- Empirical data on student LLM usage – 146 disclosure statements were collected and systematically analyzed.
- A mixed categorization pipeline – LLM‑assisted initial tagging combined with manual verification to produce a reliable taxonomy of usage patterns.
- Identification of common LLM‑supported activities – brainstorming, clarifying methodology, structuring findings, and polishing prose.
- Insights into student concerns – misinformation, hallucinations, and the need for verification were repeatedly highlighted.
- Pedagogical recommendations – concrete guidelines for integrating reflective LLM use into empirical software‑engineering courses.
Methodology
- Course context – The study took place in a third‑year software architecture class where each student had to produce a short research paper using either a rapid review or a gray‑literature review approach.
- Reflective disclosure – Students submitted a brief statement describing how they used an LLM (e.g., ChatGPT, Claude) throughout the assignment.
- Data collection – 146 statements were gathered over a semester.
- Cross‑analysis pipeline
- An LLM first auto‑tagged each statement with preliminary categories (e.g., “idea generation”, “citation checking”).
- Researchers manually inspected and refined the tags, merging overlapping categories and eliminating noise.
- Thematic synthesis – The refined tags were grouped into higher‑level themes that describe the role of LLMs in the research‑writing workflow.
Results & Findings
| Theme | Typical Student Activity | Notable Observations |
|---|---|---|
| Idea & Topic Generation | Prompting the LLM for possible research questions or keywords. | Students valued the speed of brainstorming but noted that suggestions often needed domain‑specific filtering. |
| Methodology Clarification | Asking the LLM to explain rapid‑review steps, inclusion criteria, or gray‑literature search strategies. | LLM explanations helped novices grasp concepts, yet some students discovered inaccuracies that required manual correction. |
| Organization & Synthesis | Using the LLM to outline sections, draft tables, or summarize extracted findings. | The AI accelerated structuring, but students reported occasional “hallucinated” data that had to be cross‑checked. |
| Writing & Polishing | Grammar fixes, re‑phrasing sentences, and improving readability. | Most students found this the most reliable use case, with measurable improvements in perceived writing quality. |
| Verification & Validation | Manually checking facts, citations, and generated code snippets. | A recurring concern: the need for a verification loop to avoid propagating false information. |
Overall, students perceived LLMs as productivity boosters for low‑level writing tasks, while still relying heavily on their own expertise for critical analysis and validation.
Practical Implications
- Tool‑Enhanced Curriculum Design – Instructors can embed structured LLM reflection activities (e.g., mandatory disclosure statements) to teach responsible AI use while harvesting valuable data for continuous improvement.
- Rapid Prototyping of Literature Reviews – Teams can leverage LLMs for initial scoping and outline generation, cutting down the time spent on repetitive drafting.
- Quality Assurance Workflows – The study underscores the necessity of a verification step; developers building LLM‑assisted authoring tools should integrate citation checking, source linking, and hallucination detection mechanisms.
- Skill Development for Future Engineers – Familiarity with prompting, prompt engineering, and critical evaluation of AI‑generated content becomes a marketable competency in data‑driven software research and documentation.
- Policy & Ethics Guidance – The disclosed concerns provide a baseline for institutional policies on AI‑assisted academic work, balancing innovation with academic integrity.
Limitations & Future Work
- Single‑course, single‑institution scope – Findings may not generalize across different curricula, cultural contexts, or experience levels.
- Self‑reported data – Students’ disclosures could be incomplete or biased toward socially desirable usage.
- LLM version dependency – The study was conducted with specific LLMs available at the time; rapid model evolution may change usage patterns.
Future research directions suggested by the authors include longitudinal studies across multiple courses, automated detection of AI‑generated text in student submissions, and the design of scaffolded prompts that explicitly guide students toward verifiable, high‑quality outputs.
Authors
- Ronnie de Souza Santos
- Maria Teresa Baldassarre
- Cleyton Magalhaes
- Italo Santos
Paper Information
- arXiv ID: 2606.05114v1
- Categories: cs.SE
- Published: June 3, 2026
- PDF: Download PDF