Neural Network Grokking Tied to Weight Norm Dynamics

By PulseAugur Editorial · [1 sources] · 2026-06-15 04:00

Researchers have investigated the phenomenon of "grokking" in neural networks, where generalization occurs significantly after the model has already fit the training data. Their study suggests that the weight norm plays a crucial role in this delayed generalization. By intervening and manipulating the weight norm during training, they found that a specific critical norm value, Wc, is consistently reached, and this value scales with the network's modular base as a power law. Furthermore, they observed that holding the norm at a fixed multiple of Wc results in a grokking delay that follows an exponential relationship with the norm multiple. AI

RANK_REASON This is a research paper detailing a new finding about neural network behavior. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Truong Xuan Khanh, Doan Hoang Viet, Luu Duc Trung, Phan Thanh Duc · 2026-06-15 04:00

The Weight Norm Sets the Grokking Timescale: A Causal Delay Law

arXiv:2606.13753v1 Announce Type: cross Abstract: Grokking is the delayed onset of generalization in neural networks, arising long after they fit the training data. Whether the weight norm causes this delay is disputed: some studies report a critical norm at the transition, other…

COVERAGE [1]

The Weight Norm Sets the Grokking Timescale: A Causal Delay Law

RELATED ENTITIES

RELATED TOPICS