grokking
PulseAugur coverage of grokking — every cluster mentioning grokking across labs, papers, and developer communities, ranked by signal.
8 day(s) with sentiment data
-
Grokking in ML requires breaking data symmetry for generalization
Researchers have investigated the phenomenon of grokking in machine learning, where a model achieves high training accuracy but only generalizes to new data much later. Their study, using the Recursive Feature Machine (…
-
Research: Addressable memory crucial for AI edit propagation, not just learning
A new research paper explores how neural networks learn and retain information, distinguishing between 'grokking' and 'edit propagation'. The study found that repeated shared access, whether through loop recurrence or m…
-
Transformer grokking delay linked to decoder bottleneck, study finds
A new research paper explores the phenomenon of 'grokking' in transformers, where models abruptly generalize after a long delay during training on algorithmic tasks. The study suggests this delay stems from limited acce…
-
Weight norm's role in neural network grokking clarified
Researchers have investigated the phenomenon of 'grokking' in neural networks, where a model transitions from memorization to generalization. Their findings indicate that the weight norm, previously thought to be the pr…
-
New theory explains grokking in deep neural networks via L2 phase transitions
Researchers have developed a new theory explaining the phenomenon of "grokking" in deep neural networks, where a model abruptly begins to generalize after a period of overfitting. The study, published on arXiv, proposes…
-
Neural Network Grokking Tied to Weight Norm Dynamics
Researchers have investigated the phenomenon of "grokking" in neural networks, where generalization occurs significantly after the model has already fit the training data. Their study suggests that the weight norm plays…
-
New metric predicts transformer 'grokking' phenomenon
A new research paper introduces the Frequency Synchronization Degree (FSD), a metric designed to predict the phenomenon of 'grokking' in transformer models. Grokking is characterized by a sudden improvement in a model's…
-
New Metric Predicts Transformer 'Grokking' Phenomenon
A new research paper introduces the Frequency Synchronization Degree (FSD), a metric to measure the synchronization of Fourier circuits in Grokking Transformers. This metric consistently predicts grokking, the phenomeno…
-
Grokking explained by two training clocks theory
Researchers have developed a theoretical framework to explain the phenomenon of "grokking" in machine learning, where a model fits training data and learns a generalizable rule at different rates. They propose the conce…
-
New papers analyze neural network grokking via spectral geometry
Two new arXiv papers explore the phenomenon of 'grokking' in neural networks, where models generalize only after memorizing training data. One paper proposes 'Low-Rank Decay' (LRD) as a spectral regularizer to improve g…
-
Researchers explore grokking phenomenon in ridge regression
Three new research papers explore the concept of "grokking" in machine learning, specifically within the context of ridge regression. One paper presents a numerical procedure to find optimal regularization strength, dem…
-
New Framework Decodes Deep Learning Phenomena: Grokking and Double Descent
Researchers have developed a new framework to analyze and explain complex learning dynamics in deep neural networks, specifically focusing on phenomena like grokking and double descent. This framework decomposes learnin…
-
Weight decay controls transformer training regimes, new diagnostics revealed
Researchers have identified weight decay as a key parameter controlling the training regimes of transformers on modular arithmetic tasks. They introduced two new, low-cost online diagnostics—mean pairwise attention-head…
-
Singular Learning Theory offers new perspective on AI model grokking
Researchers have explored the phenomenon of "grokking," where machine learning models abruptly shift from memorization to generalization after extended training. Using Singular Learning Theory (SLT), they propose that g…
-
Topology research reveals neural network grokking signatures and architectural bypasses
Researchers are exploring the phenomenon of 'grokking' in neural networks, where models initially memorize data before generalizing. One study proposes modifying architectural topology, such as enforcing spherical const…
-
Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning
A new theory, the Norm-Separation Delay Law, explains the phenomenon of grokking, where models generalize long after memorizing training data. Researchers demonstrated that grokking is a representational phase transitio…