PulseAugur
实时 17:11:55
English(EN) Looped Transformers with Layer Normalization Provably Learn the Power Method

带层归一化的循环 Transformer 可证明地学习幂法

研究人员从理论上证明了带层归一化的循环 Transformer 如何学习用于主成分预测的幂法。该研究证明,此类模型在梯度下降训练时,会收敛到一个有效执行幂迭代的解决方案,其中每个注意力层执行一次迭代。这项工作突显了一种“算法隐式偏差”,即模型选择幂法来实现主成分预测,并显示与没有层归一化的 Transformer 相比,存在可证明的性能差距。 AI

影响 为 Transformer 学习机制提供了理论见解,可能指导未来的模型架构和训练策略。

排序理由 这是对 Transformer 训练动态的理论分析,发表在一篇学术论文中。

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv stat.ML TIER_1 English(EN) · Lyumin Wu, Chenyang Zhang, Yuan Cao ·

    具有层归一化的循环 Transformer 可证明地学习幂法

    arXiv:2606.00605v1 Announce Type: cross Abstract: Transformers have achieved remarkable success across a wide range of applications, and a growing body of work suggests that part of their strength comes from their ability to learn and execute algorithmic procedures. However, our …

  2. arXiv stat.ML TIER_1 English(EN) · Yuan Cao ·

    Looped Transformers with Layer Normalization Provably Learn the Power Method

    Transformers have achieved remarkable success across a wide range of applications, and a growing body of work suggests that part of their strength comes from their ability to learn and execute algorithmic procedures. However, our understanding of how transformers learn such algor…