Researchers have developed CHA-Gen, a new dataset designed to evaluate how well large language models understand linguistic ambiguity in Chinese. This dataset, grounded in Potential Ambiguity Theory, includes over 5,700 sentences and is the first of its kind to offer scalability for Chinese ambiguity research. Evaluations using models like Gemma 3 and Qwen 2.5/3 series revealed that LLMs struggle with detecting ambiguity, though Chain-of-Thought prompting shows improvement. The study also identified common failure modes in LLMs, such as ambiguity blindness and misattribution, and noted a bias towards dominant interpretations. AI
影响 Provides a scalable method for creating Chinese ambiguity datasets, enabling better evaluation and improvement of LLM performance on nuanced language understanding tasks.
排序理由 The cluster contains a new academic paper detailing a novel dataset and evaluation methodology for LLMs. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →