English(EN) The Weight Norm Sets the Grokking Timescale: A Causal Delay Law

神经网络“领悟”与权重范数动力学相关

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-15 04:00

研究人员调查了神经网络中“领悟”（grokking）现象，即模型在已拟合训练数据后仍发生泛化。他们的研究表明，权重范数在此延迟泛化中起着关键作用。通过在训练过程中干预和操纵权重范数，他们发现了一个始终达到的特定临界范数值 Wc，并且该值与网络的模块化基数呈幂律关系。此外，他们观察到将范数保持在 Wc 的固定倍数，会导致“领悟”延迟呈范数倍数的指数关系。 AI

排序理由这是一篇详细介绍神经网络行为新发现的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Truong Xuan Khanh, Doan Hoang Viet, Luu Duc Trung, Phan Thanh Duc · 2026-06-15 04:00

The Weight Norm Sets the Grokking Timescale: A Causal Delay Law

arXiv:2606.13753v1 Announce Type: cross Abstract: Grokking is the delayed onset of generalization in neural networks, arising long after they fit the training data. Whether the weight norm causes this delay is disputed: some studies report a critical norm at the transition, other…

报道来源 [1]

The Weight Norm Sets the Grokking Timescale: A Causal Delay Law

相关实体

相关话题