LLMs Show Mixed Results in Automated Essay Scoring

By PulseAugur Editorial · [3 sources] · 2026-05-25 15:04

Two new research papers explore the effectiveness of large language models (LLMs) in automated essay scoring (AES). The first paper synthesizes 65 studies, finding that LLM-human agreement in essay scoring is highly context-dependent and varies significantly. The second paper investigates domain-adaptive pretraining (DAPT) on learner corpora for AES, suggesting that while targeted DAPT can improve in-domain scoring, it doesn't consistently enhance cross-dataset transferability. AI

IMPACT These studies highlight the nuanced performance of LLMs in educational assessment, indicating areas where further research and development are needed for reliable application.

RANK_REASON The cluster contains two academic papers published on arXiv discussing research findings related to LLMs and automated essay scoring.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

LLMs Show Mixed Results in Automated Essay Scoring

COVERAGE [3]

arXiv cs.CL TIER_1 English(EN) · Hongli Li, Che Han Chen, Kevin Fan, Chiho Young-Johnson, Soyoung Lim, Yali Feng · 2026-05-27 04:00

Agreement Between Large Language Models and Human Raters in Essay Scoring: A Research Synthesis

arXiv:2512.14561v2 Announce Type: replace Abstract: Despite the growing promise of large language models (LLMs) in automated essay scoring (AES), empirical findings regarding their reliability compared to human raters remain mixed. Following the PRISMA 2020 guidelines, we synthes…
arXiv cs.CL TIER_1 English(EN) · Duy Anh Nguyen · 2026-05-26 04:00

Does Continued Pretraining on a Learner Corpus Improve Automated Essay Scoring on English Proficiency Tests? Evidence from EFCAMDAT

arXiv:2605.25924v1 Announce Type: new Abstract: Recent automated essay scoring (AES) studies increasingly use pretrained transformer models, but these models are usually pretrained on general-domain English and may under-represent second-language learner writing. This study inves…
arXiv cs.LG TIER_1 English(EN) · Duy Anh Nguyen · 2026-05-25 15:04

Does Continued Pretraining on a Learner Corpus Improve Automated Essay Scoring on English Proficiency Tests? Evidence from EFCAMDAT

Recent automated essay scoring (AES) studies increasingly use pretrained transformer models, but these models are usually pretrained on general-domain English and may under-represent second-language learner writing. This study investigates whether domain-adaptive continued pretra…

COVERAGE [3]

Agreement Between Large Language Models and Human Raters in Essay Scoring: A Research Synthesis

Does Continued Pretraining on a Learner Corpus Improve Automated Essay Scoring on English Proficiency Tests? Evidence from EFCAMDAT

Does Continued Pretraining on a Learner Corpus Improve Automated Essay Scoring on English Proficiency Tests? Evidence from EFCAMDAT

RELATED ENTITIES

RELATED TOPICS