Researchers have introduced G-IdiomAlign, a new benchmark designed to evaluate how well large language models can align idioms across different languages. The benchmark uses English glosses from Wiktionary as a pivot to anchor idioms, addressing the challenges posed by their non-compositional nature and weak surface-form grounding. Initial tests reveal that LLMs often exhibit a bias towards literal translation, particularly with low-resource languages, and that using glosses improves performance in controlled generation tasks, though significant room for improvement remains. AI
IMPACT This benchmark could drive improvements in LLMs' ability to handle nuanced linguistic phenomena like idioms, enhancing cross-lingual communication.
RANK_REASON The cluster describes a new academic benchmark for evaluating LLM capabilities, presented in a research paper.
- arXiv
- G-IdiomAlign
- Hugging Face
- Qwen3 8B
- Wiktionary
- alphaXiv
- CatalyzeX
- DagsHub
- Gotit.pub
- ScienceCast
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →