PulseAugur
实时 15:13:59
English(EN) A Mean-Field Analysis of Multi-Head Self-Attention under Cross-Entropy Training

均场理论分析多头自注意力训练

研究人员开发了一种均场理论来分析使用交叉熵训练的多头自注意力模型。该研究将每个注意力头视为一个粒子,在无限头极限下使用头的经验定律作为状态变量。该框架建立了一个非线性 Wasserstein 梯度流方程,并为训练动力学提供了理论界限和收敛速度,为理解注意力机制提供了严格的基准。 AI

影响 为理解深度学习模型中注意力机制的训练动力学提供了理论框架。

排序理由 该集群包含一篇详细阐述机器学习模型架构理论分析的学术论文。

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv stat.ML TIER_1 English(EN) · Cheng Huan, Hongfwei Yuan ·

    A Mean-Field Analysis of Multi-Head Self-Attention under Cross-Entropy Training

    arXiv:2606.10469v1 Announce Type: cross Abstract: This paper develops a mean-field theory for a simplified single-layer causal multi-head self-attention model trained by cross-entropy minimization. Each attention head is treated as a particle in parameter space, and the empirical…

  2. arXiv stat.ML TIER_1 English(EN) · Hongfwei Yuan ·

    交叉熵训练下多头自注意力机制的平均场分析

    This paper develops a mean-field theory for a simplified single-layer causal multi-head self-attention model trained by cross-entropy minimization. Each attention head is treated as a particle in parameter space, and the empirical law of the heads is used as the large-head state …