English(EN) Does Continued Pretraining on a Learner Corpus Improve Automated Essay Scoring on English Proficiency Tests? Evidence from EFCAMDAT

大型语言模型在自动化作文评分方面表现不一

作者 PulseAugur 编辑部 · [3 个来源] · 2026-05-25 15:04

两篇新研究论文探讨了大型语言模型（LLMs）在自动化作文评分（AES）方面的有效性。第一篇论文综合了 65 项研究，发现 LLM 与人类在作文评分上的一致性高度依赖于上下文，并且差异显著。第二篇论文研究了在学习者语料库上进行领域自适应预训练（DAPT）以用于 AES，表明虽然有针对性的 DAPT 可以提高领域内评分，但并不能持续增强跨数据集的可迁移性。 AI

影响这些研究突显了 LLM 在教育评估中的细微表现，指出了在可靠应用方面需要进一步研究和开发的领域。

排序理由该集群包含两篇在 arXiv 上发表的学术论文，讨论了与 LLM 和自动化作文评分相关的研究结果。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.CL TIER_1 English(EN) · Hongli Li, Che Han Chen, Kevin Fan, Chiho Young-Johnson, Soyoung Lim, Yali Feng · 2026-05-27 04:00

大型语言模型与人工评分者在论文评分中的一致性：一项研究综合

arXiv:2512.14561v2 Announce Type: replace Abstract: Despite the growing promise of large language models (LLMs) in automated essay scoring (AES), empirical findings regarding their reliability compared to human raters remain mixed. Following the PRISMA 2020 guidelines, we synthes…
arXiv cs.CL TIER_1 English(EN) · Duy Anh Nguyen · 2026-05-26 04:00

在学习者语料库上持续预训练能否提高英语水平测试的自动作文评分？来自 EFCAMDAT 的证据

arXiv:2605.25924v1 Announce Type: new Abstract: Recent automated essay scoring (AES) studies increasingly use pretrained transformer models, but these models are usually pretrained on general-domain English and may under-represent second-language learner writing. This study inves…
arXiv cs.LG TIER_1 English(EN) · Duy Anh Nguyen · 2026-05-25 15:04

在学习者语料库上持续预训练是否能提高英语水平测试的自动作文评分？来自 EFCAMDAT 的证据

Recent automated essay scoring (AES) studies increasingly use pretrained transformer models, but these models are usually pretrained on general-domain English and may under-represent second-language learner writing. This study investigates whether domain-adaptive continued pretra…

报道来源 [3]

大型语言模型与人工评分者在论文评分中的一致性：一项研究综合

在学习者语料库上持续预训练能否提高英语水平测试的自动作文评分？来自 EFCAMDAT 的证据

在学习者语料库上持续预训练是否能提高英语水平测试的自动作文评分？来自 EFCAMDAT 的证据

相关实体

相关话题