PulseAugur
EN
LIVE 05:00:59

AI models forget learned rules mid-training, new research finds

Researchers have identified a phenomenon called "natural ungrokking" in language models, where learned rules can disappear mid-training without any change in the loss curve. This forgetting is directly correlated with the frequency of a rule's appearance in the training data; rules that appear less often are more susceptible to being overwritten by competing patterns. Interestingly, the process is asymmetric: while external intervention can easily destroy a learned rule, reintroducing supporting data does not reliably restore it. AI

IMPACT This research highlights a critical vulnerability in current LLM training, suggesting that models may not retain learned knowledge reliably, impacting their long-term utility and safety.

RANK_REASON The cluster contains a research paper detailing a new phenomenon observed in AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

AI models forget learned rules mid-training, new research finds

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Diya Sreedhar ·

    Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

    Midway through an ordinary pretraining run, a small language model learns the pronoun-gender rule: cued with a girl's name ("Sue cried because"), it resolves the next pronoun to she, generalizing to held-out probes (0.94 by step 925). By step 3,500 the same model scores near zero…