PulseAugur
实时 00:03:07
实体 grokking

grokking

PulseAugur coverage of grokking — every cluster mentioning grokking across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
4
90 天内 4
发布 · 30天
0
90 天内 0
论文 · 30天
4
90 天内 4
层级分布 · 90 天
情绪 · 30 天

1 天有情绪数据

最近 · 第 1/1 页 · 共 4 条
  1. RESEARCH · CL_44706 ·

    Weight decay controls transformer training regimes, new diagnostics revealed

    Researchers have identified weight decay as a key parameter controlling the training regimes of transformers on modular arithmetic tasks. They introduced two new, low-cost online diagnostics—mean pairwise attention-head…

  2. TOOL · CL_22149 ·

    Singular Learning Theory offers new perspective on AI model grokking

    Researchers have explored the phenomenon of "grokking," where machine learning models abruptly shift from memorization to generalization after extended training. Using Singular Learning Theory (SLT), they propose that g…

  3. RESEARCH · CL_16242 ·

    Topology research reveals neural network grokking signatures and architectural bypasses

    Researchers are exploring the phenomenon of 'grokking' in neural networks, where models initially memorize data before generalizing. One study proposes modifying architectural topology, such as enforcing spherical const…

  4. RESEARCH · CL_14472 ·

    Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning

    A new theory, the Norm-Separation Delay Law, explains the phenomenon of grokking, where models generalize long after memorizing training data. Researchers demonstrated that grokking is a representational phase transitio…