A classroom study examined how students in a Machine Translation and Post-editing course evaluated general-purpose LLMs and online MT systems. Students translated English Wikipedia texts into Catalan or Spanish, assessed system outputs using automatic metrics and human judgment, and then selected one for post-editing, justifying their choice. The findings indicated that students did not solely rely on automatic metrics, often choosing outputs that differed from metric rankings based on factors like adequacy, fluency, terminology, naturalness, and anticipated post-editing effort. AI
IMPACT This research highlights how human evaluators, even in an academic setting, consider factors beyond automated metrics when assessing AI translation quality.
RANK_REASON The cluster contains an academic paper detailing a classroom study on AI-mediated translation evaluation. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →