English(EN) IFMTBench: A Comprehensive Benchmark for Multilingual Translation Instruction Following

新基准评估多语言翻译指令遵循能力

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-27 09:35

研究人员推出了 IFMTBench，这是一个旨在评估多语言翻译指令遵循能力的新基准。该基准通过评估模型在保持语义等价性之外，遵循特定约束（如保留 JSON/HTML 模式、使用术语表和匹配规定语域）的能力，来解决现有指标的局限性。IFMTBench 涵盖七种语言，并包含单一和多重约束项的混合，表明指令遵循能力随模型规模的增长比单独的翻译质量增长更显著。 AI

影响该基准将帮助研究人员更好地理解和提高翻译模型遵循复杂多语言指令的能力。

排序理由该集群包含一篇介绍用于评估 AI 模型的新基准的研究论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Mingrui Sun, Mao Zheng, Zheng Li, Mingyang Song · 2026-05-28 04:00

IFMTBench：多语言翻译指令遵循的综合基准测试

arXiv:2605.28218v1 Announce Type: new Abstract: Modern translation workflows demand more than semantic equivalence. Users routinely require models to preserve JSON or HTML schemas, honor curated glossaries, disambiguate with provided context, and match prescribed registers, often…
arXiv cs.CL TIER_1 English(EN) · Mingyang Song · 2026-05-27 09:35

IFMTBench：多语言翻译指令遵循的综合基准测试

Modern translation workflows demand more than semantic equivalence. Users routinely require models to preserve JSON or HTML schemas, honor curated glossaries, disambiguate with provided context, and match prescribed registers, often several at once. Conventional metrics such as B…

报道来源 [2]

IFMTBench：多语言翻译指令遵循的综合基准测试

IFMTBench：多语言翻译指令遵循的综合基准测试

相关实体

相关话题