PulseAugur
实时 12:37:16
English(EN) When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates

新基准测试大型语言模型在阿拉伯语-希伯来语同源词歧义上的表现

研究人员开发了 SemCog Bench,这是一个旨在评估大型语言模型(LLMs)在处理阿拉伯语和希伯来语同源词方面能力的基准。该基准包含 1,858 对单词和句子级别的注释,用于测试识别和语义消歧能力。评估显示,大型语言模型在真正的同源词上表现良好,但在处理假朋友词和外来词时却面临显著困难,这表明它们依赖于表面相似性而非深层语义理解。即使有上下文线索的帮助,性能提升也很有限,这凸显了当前大型语言模型在解决跨语言意义冲突方面的根本局限性。 AI

影响 强调了大型语言模型在跨语言理解方面的局限性,可能指导未来模型在细微语义推理方面的发展。

排序理由 该集群描述了一篇介绍用于评估大型语言模型在语言任务上表现的基准的新学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Junhong Liang, Noor Abo Mokh, Bashar Alhafni ·

    When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates

    arXiv:2606.13218v1 Announce Type: new Abstract: Arabic and Hebrew, as closely related Semitic languages, share a substantial lexicon of true cognates, misleading false friends, and modern loanwords. This overlap poses a challenge for cross-lingual semantic understanding in large …

  2. arXiv cs.CL TIER_1 English(EN) · Bashar Alhafni ·

    When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates

    Arabic and Hebrew, as closely related Semitic languages, share a substantial lexicon of true cognates, misleading false friends, and modern loanwords. This overlap poses a challenge for cross-lingual semantic understanding in large language models (LLMs). To evaluate this capabil…