PulseAugur
实时 07:22:12

新理论解释神经网络的“领悟”现象

研究人员开发了一个新的理论框架来解释“领悟”(grokking)现象,即神经网络在最初记住训练数据后突然泛化。该理论描述了由Adam优化动力学和权重收缩正则化引起的解空间中的壳-核拓扑结构。这种结构解释了从记忆到泛化的转变,并允许推导出与学习率、批次大小和L2正则化相关的缩放定律。 AI

影响 为“领悟”现象提供了理论解释,可能指导未来的模型训练和架构设计。

排序理由 该集群包含一篇学术论文,详细介绍了机器学习现象的新理论框架。

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

新理论解释神经网络的“领悟”现象

报道来源 [2]

  1. arXiv stat.ML TIER_1 English(EN) · R\'ois\'in Luo, Christian Gagn\'e, Jonas Ngnaw\'e, Ihsan Ullah, Karyn Morrissey ·

    Grokking中缩放定律的随机几何理论

    arXiv:2606.30388v1 Announce Type: new Abstract: Delayed generalization (\ie~grokking) refers to the phenomenon in which a neural network fits its training data early in training but only begins to generalize after a prolonged delay, often through an abrupt transition. Despite ext…

  2. arXiv stat.ML TIER_1 English(EN) · Karyn Morrissey ·

    Grokking 中缩放定律的随机-几何理论

    Delayed generalization (\ie~grokking) refers to the phenomenon in which a neural network fits its training data early in training but only begins to generalize after a prolonged delay, often through an abrupt transition. Despite extensive empirical study, its underlying mechanism…