Automated Essay Scoring and Language Certification: Assessing Generalizability, Agreement and Validity for French
Researchers have developed an enhanced framework for evaluating Automated Essay Scoring (AES) systems, moving beyond minimalist practices. This new framework incorporates fairness analysis, linguistic feature correlations, error prediction, and agreement with human raters. Applied to French AES, the framework was used to compare eight model architectures on a large corpus of essays, demonstrating its utility in understanding AES model capabilities and limitations. AI
IMPACT Provides a more robust methodology for assessing the reliability and fairness of AI-driven essay evaluation tools.