Evaluative Judgement in Teaching AI-based Translation: A Class-room Case Study of AI-Mediated Translation and Post-Editing
A classroom study examined how students in a Machine Translation and Post-editing course evaluated general-purpose LLMs and online MT systems. Students translated English Wikipedia texts into Catalan or Spanish, assessed system outputs using automatic metrics and human judgment, and then selected one for post-editing, justifying their choice. The findings indicated that students did not solely rely on automatic metrics, often choosing outputs that differed from metric rankings based on factors like adequacy, fluency, terminology, naturalness, and anticipated post-editing effort. AI
IMPACT This research highlights how human evaluators, even in an academic setting, consider factors beyond automated metrics when assessing AI translation quality.