English(EN) The Geometric Inductive Bias of Grokking: Bypassing Phase Transitions via Architectural Topology

拓扑研究揭示神经网络的 grokking 信号和架构绕过方法

作者 PulseAugur 编辑部 · [3 个来源] · 2026-05-05 04:00

研究人员正在探索神经网络中的“grokking”现象，即模型在开始泛化之前会先记住数据。一项研究提出修改架构拓扑，例如强制执行球形约束或使用均匀注意力，以绕过记忆阶段并加速泛化。另一篇论文利用持久同调来识别一个独特的拓扑信号——同调性的急剧增加——标志着向泛化过渡，为分析表示学习提供了一个新框架。 AI

影响这些研究通过分析架构拓扑和表示学习，为理解和潜在地加速神经网络泛化提供了新的理论框架。

排序理由两篇 arXiv 论文利用拓扑和架构修改来研究神经网络中的“grokking”现象。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.LG TIER_1 English(EN) · Yifan Tang, Qiquan Wang, In\'es Garc\'ia-Redondo, Anthea Monod · 2026-05-08 04:00

Grokking 的拓扑特征

arXiv:2605.06352v1 Announce Type: new Abstract: We study the grokking phenomenon through the lens of topology. Using persistent homology on point clouds derived from the embedding matrices of a range of models trained on modular arithmetic with varying primes, we identify a clear…
arXiv cs.LG TIER_1 English(EN) · Alper Y{\i}ld{\i}r{\i}m · 2026-05-05 04:00

Grokking 的几何归纳偏置：通过架构拓扑绕过相变

arXiv:2603.05228v3 Announce Type: replace Abstract: Mechanistic interpretability typically relies on post-hoc analysis of trained networks. We instead adopt an interventional approach: testing hypotheses a priori by modifying architectural topology to observe training dynamics. W…
arXiv stat.ML TIER_1 English(EN) · Anthea Monod · 2026-05-07 14:33

Grokking 的拓扑学特征

We study the grokking phenomenon through the lens of topology. Using persistent homology on point clouds derived from the embedding matrices of a range of models trained on modular arithmetic with varying primes, we identify a clear and consistent topological signature of grokkin…

报道来源 [3]

Grokking 的拓扑特征

Grokking 的几何归纳偏置：通过架构拓扑绕过相变

Grokking 的拓扑学特征

相关实体

相关话题