[Paper] Personalized Worked Example Generation from Student Code Submissions using Pattern-based Knowledge Components

Published: 1 day ago (April 27, 2026 at 01:56 PM EDT)

4 min read

Source: arXiv

Source: arXiv - 2604.24758v1

Overview

A new study shows how to automatically generate personalized worked examples for programming students by mining the patterns hidden in their own code submissions. By extracting “knowledge components” (KCs) directly from student programs and feeding them into a generative AI model, the system produces explanations that target the exact misconceptions a learner is grappling with—without requiring a massive hand‑crafted library of examples.

Key Contributions

Pattern‑based KC extraction: Introduces an AST‑driven pipeline that discovers recurring structural concepts (e.g., loop patterns, recursion templates) from a batch of student submissions.
KC‑conditioned generation: Couples the extracted KCs with a large‑language model (LLM) to steer the generation of worked examples toward the learner’s specific logical errors.
Empirical validation: Conducts a blind expert evaluation comparing vanilla LLM outputs with KC‑conditioned outputs, demonstrating measurable gains in topical focus and relevance.
Scalable personalization framework: Provides a reusable architecture that can be plugged into existing programming tutoring platforms, reducing the manual effort needed to maintain example libraries.

Methodology

Collect student code for a given programming exercise (e.g., implementing a binary search).
Parse each submission into an Abstract Syntax Tree (AST). The AST makes structural elements—loops, conditionals, function calls—explicit and language‑agnostic.
Cluster recurring sub‑trees across all submissions. Each cluster represents a knowledge component (KC), such as “off‑by‑one in loop bounds” or “missing base case in recursion.”
Annotate the problem statement with the KCs that appear most frequently in a particular student’s code.
Prompt a generative model (e.g., GPT‑4) with a template that includes:
- The original problem description
- The student’s code snippet
- The list of relevant KCs
- A request to produce a worked example that explicitly addresses those KCs.
Expert evaluation: Two experienced CS educators rate the generated examples on relevance, correctness, and pedagogical clarity, blind to whether the example came from the baseline or KC‑conditioned pipeline.

Results & Findings

Metric (1‑5 scale)	Baseline LLM	KC‑Conditioned LLM
Topical relevance	3.2	4.1
Alignment with error	2.9	4.0
Overall pedagogical quality	3.5	4.2

What it means:

Higher relevance: The KC‑conditioned examples directly tackled the specific mistake (e.g., “your loop stops one iteration early”), whereas baseline examples often drifted to generic solutions.
Better error alignment: Reviewers noted that the KC‑steered outputs explicitly named the problematic pattern, making it easier for students to map the explanation to their own code.
Consistent quality: No drop in correctness or readability was observed, indicating that the added conditioning does not compromise the model’s language abilities.

Practical Implications

Reduced authoring workload: Instructors no longer need to write dozens of bespoke examples for each common mistake; the system auto‑generates them on demand.
Real‑time feedback: Integrated into IDE plugins or online judges, the pipeline can produce a tailored worked example instantly after a student’s failed submission.
Scalable tutoring platforms: MOOCs, bootcamps, and corporate training portals can personalize practice at scale, improving learner retention without hiring additional teaching assistants.
Data‑driven curriculum design: By analyzing which KCs surface most often, educators can identify curriculum gaps and prioritize new instructional material.

Limitations & Future Work

Domain specificity: The current implementation focuses on relatively small, well‑structured assignments (e.g., loops, recursion). Extending to larger projects or multi‑file codebases may require more sophisticated KC hierarchies.
Model dependence: The quality of generated examples hinges on the underlying LLM; biases or hallucinations in the model could propagate into the tutoring content.
Evaluation scope: Expert ratings were limited to a handful of problems and reviewers. Larger‑scale user studies (e.g., A/B testing with actual learners) are needed to confirm learning gains.
Future directions: The authors plan to (1) incorporate dynamic execution traces to enrich KC extraction, (2) explore multimodal explanations (e.g., visualizations, step‑by‑step debuggers), and (3) automate the continual update of KC libraries as new cohorts of students submit code.

Authors

Griffin Pitts
Muntasir Hoq
Peter Brusilovsky
Narges Norouzi
Arto Hellas
Juho Leinonen
Bita Akram

Paper Information

arXiv ID: 2604.24758v1
Categories: cs.HC, cs.AI, cs.CY, cs.ET, cs.LG
Published: April 27, 2026
PDF: Download PDF

[Paper] Personalized Worked Example Generation from Student Code Submissions using Pattern-based Knowledge Components

Overview

Key Contributions

Methodology

Results & Findings

Practical Implications

Limitations & Future Work

Authors

Paper Information

Related posts

[Paper] Recursive Multi-Agent Systems

[Paper] How Fast Should a Model Commit to Supervision? Training Reasoning Models on the Tsallis Loss Continuum

[Paper] Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics

[Paper] Carbon-Taxed Transformers: A Green Compression Pipeline for Overgrown Language Models