PulseAugur
EN
LIVE 05:26:23

New theory explains neural network grokking phenomenon

Researchers have developed a new theoretical framework to explain the phenomenon of grokking, where neural networks initially memorize training data before abruptly generalizing. The theory characterizes a shell-core topological structure in the solution space, induced by Adam's optimization dynamics and weight-shrinkage regularization. This structure explains the transition from memorization to generalization and allows for the derivation of scaling laws related to learning rate, batch size, and L2 regularization. AI

IMPACT Provides a theoretical explanation for grokking, potentially guiding future model training and architecture design.

RANK_REASON The cluster contains an academic paper detailing a new theoretical framework for a machine learning phenomenon.

Read on arXiv stat.ML →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New theory explains neural network grokking phenomenon

COVERAGE [2]

  1. arXiv stat.ML TIER_1 English(EN) · R\'ois\'in Luo, Christian Gagn\'e, Jonas Ngnaw\'e, Ihsan Ullah, Karyn Morrissey ·

    A Stochastic--Geometric Theory of Scaling Laws in Grokking

    arXiv:2606.30388v1 Announce Type: new Abstract: Delayed generalization (\ie~grokking) refers to the phenomenon in which a neural network fits its training data early in training but only begins to generalize after a prolonged delay, often through an abrupt transition. Despite ext…

  2. arXiv stat.ML TIER_1 English(EN) · Karyn Morrissey ·

    A Stochastic--Geometric Theory of Scaling Laws in Grokking

    Delayed generalization (\ie~grokking) refers to the phenomenon in which a neural network fits its training data early in training but only begins to generalize after a prolonged delay, often through an abrupt transition. Despite extensive empirical study, its underlying mechanism…