English(EN) A Mechanistic Study of Transformers Training Dynamics

通过机制分析研究Transformer训练动力学 · 跟踪1个来源

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-30 04:00

研究人员对Transformer的训练动力学进行了机制研究，重点关注大规模预训练。他们使用稀疏模块加法任务，证明了在梯度下降过程中可以出现专门的注意力电路（称为聚类头）来解决该问题。该研究观察到两阶段学习过程，并确定了由归一化层的高曲率引起的损失尖峰，为大型语言模型预训练提供了可应用的见解。 AI

影响为理解Transformer内部的涌现学习机制提供了见解，可能为大型语言模型的预训练提供信息。

排序理由该集群包含一篇详细介绍Transformer训练动力学机制研究的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv stat.ML TIER_1 English(EN) · Ambroise Odonnat, Wassim Bouaziz, Vivien Cabannes · 2026-06-30 04:00

A Mechanistic Study of Transformers Training Dynamics

arXiv:2410.24050v3 Announce Type: replace-cross Abstract: Large-scale pretraining of transformers has been central to the success of foundation models. However, the scale of those models limits our understanding of the mechanisms at play during optimization. In this work, we stud…