Grokking explained by two training clocks theory

By PulseAugur Editorial · [1 sources] · 2026-06-05 04:00

Researchers have developed a theoretical framework to explain the phenomenon of "grokking" in machine learning, where a model fits training data and learns a generalizable rule at different rates. They propose the concept of "two training clocks" to distinguish the rapid decrease in classification loss from the slower simplification of the model's internal representation. This theory, initially demonstrated with deep linear networks, is then extended to explain similar behavior in ReLU MLPs, suggesting a two-stage learning process where the classifier adapts first, followed by representation refinement. AI

IMPACT Provides a theoretical explanation for a key aspect of model learning, potentially guiding future model development and training strategies.

RANK_REASON The cluster contains an academic paper detailing a new theoretical framework for understanding a machine learning phenomenon. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Hu Tan, Kuo Gai, Shihua Zhang · 2026-06-05 04:00

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction

arXiv:2606.05863v1 Announce Type: new Abstract: Grokking suggests that fitting the training data and learning a simple underlying rule may occur on different time scales. We formalize this phenomenon by separating the fast decay of the classification loss from the slower simplifi…

COVERAGE [1]

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction

RELATED ENTITIES

RELATED TOPICS