New theory explains grokking in deep neural networks via L2 phase transitions

By PulseAugur Editorial · [1 sources] · 2026-06-17 04:00

Researchers have developed a new theory explaining the phenomenon of "grokking" in deep neural networks, where a model abruptly begins to generalize after a period of overfitting. The study, published on arXiv, proposes that grokking is related to hysteresis in first-order L2 phase transitions. By deliberately trapping models in metastable states, the researchers demonstrated that SGD noise can drive them across energy barriers, leading to escape times that follow Arrhenius scaling, thus reproducing the grokking curve. AI

IMPACT Provides a theoretical framework for understanding and potentially improving generalization in deep learning models.

RANK_REASON Academic paper published on arXiv detailing a new theoretical explanation for a phenomenon in deep learning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Ibrahim Talha Ersoy, Karoline Wiesner · 2026-06-17 04:00

Noise-Driven Escape from Metastable Phases explains Grokking in Deep Neural Networks

arXiv:2606.17120v1 Announce Type: new Abstract: Deep neural networks (DNNs) exhibit first order phase transitions under variations of the L2 regularization strength, with each transition marking the onset of a new learnable feature. Below a critical regularization strength, all f…

COVERAGE [1]

Noise-Driven Escape from Metastable Phases explains Grokking in Deep Neural Networks

RELATED ENTITIES

RELATED TOPICS