ENTITY grokking

grokking

PulseAugur coverage of grokking — every cluster mentioning grokking across labs, papers, and developer communities, ranked by signal.

Total · 30d

16

16 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

16

16 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

8 day(s) with sentiment data

RECENT · PAGE 1/1 · 16 TOTAL

TOOL · CL_109879 · Jun 25 · 04:00

Grokking in ML requires breaking data symmetry for generalization

Researchers have investigated the phenomenon of grokking in machine learning, where a model achieves high training accuracy but only generalizes to new data much later. Their study, using the Recursive Feature Machine (…
TOOL · CL_108020 · Jun 24 · 04:00

Research: Addressable memory crucial for AI edit propagation, not just learning

A new research paper explores how neural networks learn and retain information, distinguishing between 'grokking' and 'edit propagation'. The study found that repeated shared access, whether through loop recurrence or m…
TOOL · CL_98084 · Jun 18 · 04:00

Transformer grokking delay linked to decoder bottleneck, study finds

A new research paper explores the phenomenon of 'grokking' in transformers, where models abruptly generalize after a long delay during training on algorithmic tasks. The study suggests this delay stems from limited acce…
TOOL · CL_98023 · Jun 18 · 04:00

Weight norm's role in neural network grokking clarified

Researchers have investigated the phenomenon of 'grokking' in neural networks, where a model transitions from memorization to generalization. Their findings indicate that the weight norm, previously thought to be the pr…
TOOL · CL_96217 · Jun 17 · 04:00

New theory explains grokking in deep neural networks via L2 phase transitions

Researchers have developed a new theory explaining the phenomenon of "grokking" in deep neural networks, where a model abruptly begins to generalize after a period of overfitting. The study, published on arXiv, proposes…
TOOL · CL_91359 · Jun 15 · 04:00

Neural Network Grokking Tied to Weight Norm Dynamics

Researchers have investigated the phenomenon of "grokking" in neural networks, where generalization occurs significantly after the model has already fit the training data. Their study suggests that the weight norm plays…
TOOL · CL_105046 · Jun 11 · 06:52

New metric predicts transformer 'grokking' phenomenon

A new research paper introduces the Frequency Synchronization Degree (FSD), a metric designed to predict the phenomenon of 'grokking' in transformer models. Grokking is characterized by a sudden improvement in a model's…
RESEARCH · CL_86574 · Jun 11 · 06:52

New Metric Predicts Transformer 'Grokking' Phenomenon

A new research paper introduces the Frequency Synchronization Degree (FSD), a metric to measure the synchronization of Fourier circuits in Grokking Transformers. This metric consistently predicts grokking, the phenomeno…
TOOL · CL_72706 · Jun 5 · 04:00

Grokking explained by two training clocks theory

Researchers have developed a theoretical framework to explain the phenomenon of "grokking" in machine learning, where a model fits training data and learns a generalizable rule at different rates. They propose the conce…
RESEARCH · CL_65711 · Jun 2 · 04:00

New papers analyze neural network grokking via spectral geometry

Two new arXiv papers explore the phenomenon of 'grokking' in neural networks, where models generalize only after memorizing training data. One paper proposes 'Low-Rank Decay' (LRD) as a spectral regularizer to improve g…
RESEARCH · CL_58584 · May 27 · 16:12

Researchers explore grokking phenomenon in ridge regression

Three new research papers explore the concept of "grokking" in machine learning, specifically within the context of ridge regression. One paper presents a numerical procedure to find optimal regularization strength, dem…
RESEARCH · CL_53550 · May 26 · 14:26

New Framework Decodes Deep Learning Phenomena: Grokking and Double Descent

Researchers have developed a new framework to analyze and explain complex learning dynamics in deep neural networks, specifically focusing on phenomena like grokking and double descent. This framework decomposes learnin…
RESEARCH · CL_44706 · May 19 · 19:48

Weight decay controls transformer training regimes, new diagnostics revealed

Researchers have identified weight decay as a key parameter controlling the training regimes of transformers on modular arithmetic tasks. They introduced two new, low-cost online diagnostics—mean pairwise attention-head…
TOOL · CL_22149 · May 8 · 04:00

Singular Learning Theory offers new perspective on AI model grokking

Researchers have explored the phenomenon of "grokking," where machine learning models abruptly shift from memorization to generalization after extended training. Using Singular Learning Theory (SLT), they propose that g…
RESEARCH · CL_16242 · May 5 · 04:00

Topology research reveals neural network grokking signatures and architectural bypasses

Researchers are exploring the phenomenon of 'grokking' in neural networks, where models initially memorize data before generalizing. One study proposes modifying architectural topology, such as enforcing spherical const…
RESEARCH · CL_14472 · May 4 · 04:00

Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

A new theory, the Norm-Separation Delay Law, explains the phenomenon of grokking, where models generalize long after memorizing training data. Researchers demonstrated that grokking is a representational phase transitio…