English(EN) Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering

AI评分器在K-12评估中展现潜力，尤其是在数学和科学领域

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-12 04:00

一篇新论文探讨了使用生成式AI模型对K-12评估进行评分，重点关注上下文工程和提示设计。研究人员使用MCAS数据，在数学、科学和ELA（英语语言艺术）领域评估了Claude Sonnet 4、Haiku 4.5、GPT-5和GPT-5 Mini等模型。研究发现，LLM评分器，特别是参数更多的模型，在数学和科学领域与人类评分者有实质性的一致性，尽管在ELA领域的表现有所不同。虽然AI生成的叙述性反馈受到好评，但生成的数值分数引起了怀疑，这表明LLM作为形成性工具更有效。 AI

影响表明LLM可以有效地协助教育工作者进行评分，有可能减轻工作量并提高反馈质量，尤其是在STEM学科中。

排序理由该集群包含一篇详细介绍AI模型在教育评估中研究的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Zewei Tian, Alex Liu, Lief Esbenshade, Michael Xiao, Zachary Zhang, Yulia L\'apicus, Thomas Han, Kevin He, Min Sun · 2026-06-12 04:00

Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering

arXiv:2606.12422v1 Announce Type: cross Abstract: The integration of large language models (LLMs) into educational assessment represents a transformative shift in classroom grading practices. While automated scoring systems and machine learning techniques have existed for decades…

报道来源 [1]

Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering

相关实体

相关话题