A new research paper explores the use of large language models (LLMs) to simulate student programming errors in Java. The study evaluated five LLMs using different prompting strategies on the CodeWorkout dataset, which contains over 74,000 student submissions. Results indicate that while LLMs can generate diverse errors, Claude Sonnet 4 showed the most balanced performance in aligning with authentic student mistakes. Expert annotations confirmed that the synthetic errors were functionally indistinguishable from real student errors. AI
IMPACT LLMs can be used to generate realistic programming errors, aiding in the development of educational tools like intelligent tutoring systems.
RANK_REASON The cluster contains a research paper detailing an academic study on LLM capabilities.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →