PulseAugur
实时 07:44:43

New methods like SMF and SAM reduce catastrophic forgetting in LLMs

Two new research papers explore methods to mitigate catastrophic forgetting in language models during fine-tuning. One paper introduces Sparse Memory Finetuning (SMF), which adds memory layers and updates only heavily accessed rows, showing improved performance on a medical exam task with minimal loss of general capabilities. The other paper investigates Sharpness-Aware Minimization (SAM) and other pretraining optimization techniques, demonstrating that biasing towards flatter minima can significantly reduce forgetting across various model sizes and post-training scenarios. AI

影响 These techniques could lead to more robust and adaptable language models that retain general knowledge while learning new tasks.

排序理由 Two arXiv papers present novel methods for mitigating catastrophic forgetting in language models.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

New methods like SMF and SAM reduce catastrophic forgetting in LLMs

报道来源 [4]

  1. arXiv cs.LG TIER_1 English(EN) · Prakhar Gupta, Garv Shah, Satyam Goyal, Anirudh Kanchi ·

    Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

    arXiv:2605.03229v1 Announce Type: cross Abstract: Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory laye…

  2. arXiv cs.CL TIER_1 English(EN) · Ishaan Watts, Catherine Li, Sachin Goyal, Jacob Mitchell Springer, Aditi Raghunathan ·

    Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

    arXiv:2605.02105v1 Announce Type: cross Abstract: Pretraining optimizers are tuned to produce the strongest possible base model, on the assumption that a stronger starting point yields a stronger model after subsequent changes like post-training and quantization. This overlooks t…

  3. arXiv cs.CL TIER_1 English(EN) · Anirudh Kanchi ·

    Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

    Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory layers to the model and, on each training step, updati…

  4. arXiv cs.CL TIER_1 English(EN) · Aditi Raghunathan ·

    Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

    Pretraining optimizers are tuned to produce the strongest possible base model, on the assumption that a stronger starting point yields a stronger model after subsequent changes like post-training and quantization. This overlooks the geometry of the base model which controls how m…