AI Graders Show Promise in K-12 Assessments, Especially for Math and Science

By PulseAugur Editorial · [1 sources] · 2026-06-12 04:00

A new paper explores the use of generative AI models for grading K-12 assessments, focusing on context engineering and prompt design. Researchers evaluated models like Claude Sonnet 4, Haiku 4.5, GPT-5, and GPT-5 Mini using MCAS data across math, science, and ELA. The study found that LLM graders, particularly those with more parameters, showed substantial agreement with human raters in math and science, though performance varied in ELA. While AI-generated narrative feedback was well-received, numerical scores generated skepticism, suggesting LLMs are more effective as formative tools. AI

IMPACT Suggests LLMs can effectively assist educators with grading, potentially reducing workload and enhancing feedback quality, particularly in STEM subjects.

RANK_REASON The cluster contains an academic paper detailing research on AI models for educational assessment. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Zewei Tian, Alex Liu, Lief Esbenshade, Michael Xiao, Zachary Zhang, Yulia L\'apicus, Thomas Han, Kevin He, Min Sun · 2026-06-12 04:00

Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering

arXiv:2606.12422v1 Announce Type: cross Abstract: The integration of large language models (LLMs) into educational assessment represents a transformative shift in classroom grading practices. While automated scoring systems and machine learning techniques have existed for decades…

COVERAGE [1]

Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering

RELATED ENTITIES

RELATED TOPICS