English(EN) A Unified Perspective on the Dynamics of Deep Transformers

新框架统一分析深度Transformer动力学

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-19 04:00

研究人员开发了一个新颖的框架来分析深度Transformer内部的复杂动力学，Transformer是许多机器学习任务的基础。通过将输入序列的演化建模为Vlasov方程，称为Transformer PDE，他们可以更好地理解注意力机制如何在层之间运作。该方法已推广到各种注意力变体，包括多头注意力、L2注意力、Sinkhorn注意力、Sigmoid注意力和掩码注意力，利用条件Wasserstein框架。该研究还独特地探索了非紧支撑的初始条件，特别是高斯数据，证明了Transformer PDE保留了高斯测度，并揭示了典型的数据各向异性行为，包括聚类现象。 AI

影响为理解和潜在改进Transformer架构提供了理论基础。

排序理由学术论文，详细介绍了分析Transformer动力学的新理论框架。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Val\'erie Castin, Pierre Ablin, Jos\'e Antonio Carrillo, Gabriel Peyr\'e · 2026-06-19 04:00

A Unified Perspective on the Dynamics of Deep Transformers

arXiv:2501.18322v2 Announce Type: replace Abstract: Transformers, which are state-of-the-art in most machine learning tasks, represent the data as sequences of vectors called tokens. This representation is then exploited by the attention function, which learns dependencies betwee…

报道来源 [1]

A Unified Perspective on the Dynamics of Deep Transformers

相关实体

相关话题