Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 7h

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction

Researchers have developed a theoretical framework to explain the phenomenon of "grokking" in machine learning, where a model fits training data and learns a generalizable rule at different rates. They propose the concept of "two training clocks" to distinguish the rapid decrease in classification loss from the slower simplification of the model's internal representation. This theory, initially demonstrated with deep linear networks, is then extended to explain similar behavior in ReLU MLPs, suggesting a two-stage learning process where the classifier adapts first, followed by representation refinement. AI

IMPACT Provides a theoretical explanation for a key aspect of model learning, potentially guiding future model development and training strategies.

Grokking
Deep linear networks
ReLU MLPs