English(EN) Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident Hallucination

Transformer记忆几何解释了LLM中的自信幻觉

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-08 04:00

研究人员开发了一个新的几何框架来理解语言模型的两种失败模式：冲突和幻觉。他们提出，学习到的事实在模型的隐藏状态空间中形成吸引子盆地，而冲突（当参数记忆和工作记忆不一致时）和幻觉（当没有存储相关事实时）都可能导致自信但错误的输出。研究表明，几何裕度（衡量隐藏状态到最近吸引子盆地的距离）比输出熵更能有效地区分正确回忆和幻觉，并且随着模型规模的增加，这个问题可能会加剧。 AI

影响引入了一种新颖的几何方法来检测LLM中的幻觉和冲突，有可能提高模型的可靠性。

排序理由学术论文，详细介绍了理解和检测模型故障的新理论框架。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Qiyao Liang, Risto Miikkulainen, Ila Fiete · 2026-05-08 04:00

Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident Hallucination

arXiv:2605.05686v1 Announce Type: new Abstract: Language models draw on two knowledge sources: facts baked into weights (parametric memory, PM) and information in context (working memory, WM). We study two mechanistically distinct failure modes--conflict, when PM and WM disagree …

报道来源 [1]

Attractor Geometry of Transformer Memory: From Conflict Arbitration to Confident Hallucination

相关实体

相关话题