PulseAugur
实时 18:37:27
English(EN) Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

线性Transformer训练动力学在高学习率下揭示混沌吸引子

研究人员分析了简化的线性Transformer模型的训练动力学,特别关注大步长学习率如何影响收敛。他们的研究表明,在超过一定的稳定性阈值后,高学习率可能导致训练吸引子产生循环、有界混沌或发散,而不是直接的解决方案。研究结果表明,大的恒定学习率会从根本上改变所学Transformer的行为,影响收敛结果。 AI

影响 揭示了大步长学习率如何破坏Transformer训练的稳定性,导致混沌动力学而非收敛。

排序理由 该集群包含一篇学术论文,详细介绍了Transformer模型训练动力学的新研究发现。

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv stat.ML TIER_1 English(EN) · Krishnakumar Balasubramanian ·

    Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

    arXiv:2605.21292v1 Announce Type: new Abstract: Gradient-flow analyses show that simplified linear transformers can learn the in-context linear-regression algorithm, but they do not explain the finite-step behavior of gradient descent at large learning rates. Motivated by empiric…

  2. arXiv stat.ML TIER_1 English(EN) · Krishnakumar Balasubramanian ·

    Large-Step Training Dynamics of a Two-Factor Linear Transformer Model

    Gradient-flow analyses show that simplified linear transformers can learn the in-context linear-regression algorithm, but they do not explain the finite-step behavior of gradient descent at large learning rates. Motivated by empirical work on high-learning-rate transformer instab…