FeedEval: Pedagogically Aligned Evaluation of LLM-Generated Essay Feedback
Researchers have developed FeedEval, a new framework designed to evaluate the quality of feedback generated by large language models (LLMs) for essays. This system assesses feedback based on pedagogical principles like specificity, helpfulness, and validity, using specialized LLM evaluators. Experiments on the ASAP++ benchmark demonstrated that FeedEval's assessments closely match human expert judgments and that using FeedEval-filtered feedback improves the performance of essay scoring models and leads to more effective essay revisions. AI
IMPACT Enhances the reliability and effectiveness of LLM-generated feedback in educational contexts, potentially improving automated essay scoring and student revision processes.