Multilingual Idioms in Sentences and Conversations Across High-, Medium-, and Low-Resource Languages
Researchers have introduced MIDI, a new dataset designed to evaluate how well multilingual NLP models understand idiomatic expressions. This dataset includes idioms in sentence and conversational contexts across high-, medium-, and low-resource languages. Benchmarking current models revealed significant performance degradation in low-resource languages and a general difficulty with literal interpretations, even with conversational context. AI
IMPACT Highlights limitations in current AI models' understanding of nuanced language, particularly in low-resource settings.