Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 11h

The Geometry of Grokking: Norm Minimization on the Zero-Loss Manifold

Researchers have proposed a new framework to understand the phenomenon of "grokking" in neural networks, where generalization occurs significantly after training data memorization. Their work suggests that this delayed learning can be explained by gradient descent minimizing the weight norm on the zero-loss manifold. The study includes formal proofs for this dynamic under specific conditions and introduces an approximation to decouple parameter learning, leading to a closed-form expression for early-layer dynamics. Experimental results validate these predictions, replicating the characteristic delayed generalization and representation learning of grokking. AI

IMPACT Provides a theoretical explanation for delayed generalization in neural networks, potentially guiding future model training strategies.

Grokking
Tiberiu Musat