Study: Students prioritize fluency and effort over metrics in AI translation evaluation

By PulseAugur Editorial · [1 sources] · 2026-06-16 04:00

A classroom study examined how students in a Machine Translation and Post-editing course evaluated general-purpose LLMs and online MT systems. Students translated English Wikipedia texts into Catalan or Spanish, assessed system outputs using automatic metrics and human judgment, and then selected one for post-editing, justifying their choice. The findings indicated that students did not solely rely on automatic metrics, often choosing outputs that differed from metric rankings based on factors like adequacy, fluency, terminology, naturalness, and anticipated post-editing effort. AI

IMPACT This research highlights how human evaluators, even in an academic setting, consider factors beyond automated metrics when assessing AI translation quality.

RANK_REASON The cluster contains an academic paper detailing a classroom study on AI-mediated translation evaluation. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Gokhan Dogru · 2026-06-16 04:00

Evaluative Judgement in Teaching AI-based Translation: A Class-room Case Study of AI-Mediated Translation and Post-Editing

arXiv:2606.15483v1 Announce Type: new Abstract: Drawing on 23 anonymized student pro-jects from a fourth-year Machine Transla-tion and Post-editing course in a BA-level translation programme, this paper exam-ines how structured comparison of gen-eral-purpose LLMs and online MT sy…

COVERAGE [1]

Evaluative Judgement in Teaching AI-based Translation: A Class-room Case Study of AI-Mediated Translation and Post-Editing

RELATED ENTITIES

RELATED TOPICS