PulseAugur
EN
LIVE 19:11:11

New dataset compares LLM and expert writing feedback

A new dataset called FOXGLOVE has been released, containing feedback on argumentative essays from both human experts and large language models. The dataset includes over 2,300 feedback comments, with LLMs generating more complex and longer feedback than human instructors. While both human and AI feedback align on general goals and essay positions, they differ in the specific sentences they target for improvement. Interestingly, human instructors rated LLM feedback higher on quality, though this was largely attributed to the LLMs' tendency to provide lengthier comments. AI

IMPACT Provides a benchmark for evaluating LLM writing assistance capabilities against human experts.

RANK_REASON The cluster contains an academic paper detailing a new dataset and research findings.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New dataset compares LLM and expert writing feedback

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Yijun Liu, Yifan Song, John Gallagher, Sarah Sterman, Tal August ·

    FOXGLOVE: Understanding Goal-Oriented and Anchored Writing Feedback from Experts and LLMs on Argumentative Essays

    arXiv:2606.06271v1 Announce Type: new Abstract: While large language models (LLMs) are increasingly used to generate writing feedback, there remains no systematic comparison of LLM and expert feedback on the dimensions that writing research identifies as central to revision: goal…

  2. arXiv cs.CL TIER_1 English(EN) · Tal August ·

    FOXGLOVE: Understanding Goal-Oriented and Anchored Writing Feedback from Experts and LLMs on Argumentative Essays

    While large language models (LLMs) are increasingly used to generate writing feedback, there remains no systematic comparison of LLM and expert feedback on the dimensions that writing research identifies as central to revision: goal-orientation, anchoring to specific sentences, a…