English(EN) All Routes Lead to Collapse

新研究将 Transformer 路径病理与通用路由机制联系起来

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-21 03:59

arXiv 上的一篇新论文提出，像注意力汇聚点（attention sinks）和表征塌陷（representation collapse）这样的常见 Transformer 病理并非注意力机制独有，而是内容路由在固定相似度度量下的固有缺陷。该研究将 softmax 注意力重新定义为欧氏距离上的玻尔兹曼加权聚合，并提出路由与表征不匹配的路由器会导致路由集中并使表征塌陷。这种现象在包括 Transformer、图注意力（graph attention）、状态空间模型（state-space models）和循环混合器（recurrent mixers）在内的各种架构中都有观察到，表明这是一种通用机制，而非 Transformer 特有的问题。 AI

影响这项研究为理解和潜在缓解各种神经网络架构中的性能下降提供了一个新的理论框架。

排序理由该集群包含一篇发表在 arXiv 上的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · K. R. Balasubramanian · 2026-06-21 03:59

条条大路通罗马

Attention sinks, representation collapse, and norm stratification are treated as transformer-specific pathologies. We show they are not specific to attention: they are what content-based routing does under a fixed similarity metric. We give a reframing identity: softmax attention…

报道来源 [1]

条条大路通罗马

相关实体

相关话题