English(EN) What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy

权重范数在神经网络 Grokking 中的作用得到阐明

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-18 04:00

研究人员调查了神经网络中“Grokking”现象，即模型从记忆转向泛化。他们的发现表明，先前被认为是这种转变主要驱动因素的权重范数，主要充当 Logit 尺度的上游控制。通过直接操纵 Logit 尺度，研究人员可以控制 Grokking 延迟的整个范围，而权重范数仅产生微小的附加效应。发现这种关系取决于所使用的损失函数，均方误差显示出与交叉熵不同的机制。 AI

影响阐明了神经网络泛化的底层机制，可能为未来的模型架构和训练策略提供信息。

排序理由该条目是一篇详细介绍机器学习现象研究结果的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Truong Xuan Khanh · 2026-06-18 04:00

What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy

arXiv:2606.18465v1 Announce Type: cross Abstract: Grokking, the delayed jump from memorization to generalization, is usually tied to the weight norm: a smaller norm generalizes sooner. We ask what the norm actually controls. Holding the weight norm fixed by clamping and varying o…

报道来源 [1]

What Does the Weight Norm Control in Grokking? Logit-Scale Mediation under Cross-Entropy

相关实体

相关话题