Researchers have introduced IFMTBench, a new benchmark designed to evaluate multilingual translation instruction following capabilities. This benchmark addresses the limitations of existing metrics by assessing a model's ability to adhere to specific constraints beyond semantic equivalence, such as preserving JSON/HTML schemas, using glossaries, and matching prescribed registers. IFMTBench covers seven languages and includes a mix of single and multi-constraint items, revealing that instruction following scales more sharply with model size than translation quality alone. AI
IMPACT This benchmark will help researchers better understand and improve the ability of translation models to follow complex, multilingual instructions.
RANK_REASON The cluster contains a research paper introducing a new benchmark for evaluating AI models.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →