G-IdiomAlign: A Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment
Researchers have introduced G-IdiomAlign, a new benchmark designed to evaluate how well large language models can align idioms across different languages. The benchmark uses English glosses from Wiktionary as a pivot to anchor idioms, addressing the challenges posed by their non-compositional nature and weak surface-form grounding. Initial tests reveal that LLMs often exhibit a bias towards literal translation, particularly with low-resource languages, and that using glosses improves performance in controlled generation tasks, though significant room for improvement remains. AI
IMPACT This benchmark could drive improvements in LLMs' ability to handle nuanced linguistic phenomena like idioms, enhancing cross-lingual communication.