Circuit Synchronization Precedes Generalization: Causal Evidence from Fourier Structure in Grokking Transformers
A new research paper introduces the Frequency Synchronization Degree (FSD), a metric to measure the synchronization of Fourier circuits in Grokking Transformers. This metric consistently predicts grokking, the phenomenon where a transformer model rapidly improves its accuracy on modular arithmetic tasks, by synchronizing hundreds to thousands of steps before the actual grokking event. The study also provides causal evidence that the timing of grokking can be controlled by adjusting weight decay, demonstrating a predictable relationship between the decay rate and the speed of grokking. AI
IMPACT Introduces a new metric to predict and potentially control the 'grokking' phenomenon in transformers, offering insights into model generalization.