PulseAugur
LIVE 07:10:07
research · [4 sources] ·
0
research

New methods like SMF and SAM reduce catastrophic forgetting in LLMs

Two new research papers explore methods to mitigate catastrophic forgetting in language models during fine-tuning. One paper introduces Sparse Memory Finetuning (SMF), which adds memory layers and updates only heavily accessed rows, showing improved performance on a medical exam task with minimal loss of general capabilities. The other paper investigates Sharpness-Aware Minimization (SAM) and other pretraining optimization techniques, demonstrating that biasing towards flatter minima can significantly reduce forgetting across various model sizes and post-training scenarios. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT These techniques could lead to more robust and adaptable language models that retain general knowledge while learning new tasks.

RANK_REASON Two arXiv papers present novel methods for mitigating catastrophic forgetting in language models.

Read on arXiv cs.CL →

COVERAGE [4]

  1. arXiv cs.LG TIER_1 · Prakhar Gupta, Garv Shah, Satyam Goyal, Anirudh Kanchi ·

    Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

    arXiv:2605.03229v1 Announce Type: cross Abstract: Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory laye…

  2. arXiv cs.CL TIER_1 · Ishaan Watts, Catherine Li, Sachin Goyal, Jacob Mitchell Springer, Aditi Raghunathan ·

    Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

    arXiv:2605.02105v1 Announce Type: cross Abstract: Pretraining optimizers are tuned to produce the strongest possible base model, on the assumption that a stronger starting point yields a stronger model after subsequent changes like post-training and quantization. This overlooks t…

  3. arXiv cs.CL TIER_1 · Anirudh Kanchi ·

    Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

    Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory layers to the model and, on each training step, updati…

  4. arXiv cs.CL TIER_1 · Aditi Raghunathan ·

    Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

    Pretraining optimizers are tuned to produce the strongest possible base model, on the assumption that a stronger starting point yields a stronger model after subsequent changes like post-training and quantization. This overlooks the geometry of the base model which controls how m…