New methods like SMF and SAM reduce catastrophic forgetting in LLMs

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 4 sources

Two new research papers explore methods to mitigate catastrophic forgetting in language models during fine-tuning. One paper introduces Sparse Memory Finetuning (SMF), which adds memory layers and updates only heavily accessed rows, showing improved performance on a medical exam task with minimal loss of general capabilities. The other paper investigates Sharpness-Aware Minimization (SAM) and other pretraining optimization techniques, demonstrating that biasing towards flatter minima can significantly reduce forgetting across various model sizes and post-training scenarios. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT These techniques could lead to more robust and adaptable language models that retain general knowledge while learning new tasks.

RANK_REASON Two arXiv papers present novel methods for mitigating catastrophic forgetting in language models.

Read on arXiv cs.CL →

paper
other

COVERAGE [4]

arXiv cs.LG TIER_1 · Prakhar Gupta, Garv Shah, Satyam Goyal, Anirudh Kanchi · 2026-05-06 04:00

Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

arXiv:2605.03229v1 Announce Type: cross Abstract: Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory laye…
arXiv cs.CL TIER_1 · Ishaan Watts, Catherine Li, Sachin Goyal, Jacob Mitchell Springer, Aditi Raghunathan · 2026-05-05 04:00

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

arXiv:2605.02105v1 Announce Type: cross Abstract: Pretraining optimizers are tuned to produce the strongest possible base model, on the assumption that a stronger starting point yields a stronger model after subsequent changes like post-training and quantization. This overlooks t…
arXiv cs.CL TIER_1 · Anirudh Kanchi · 2026-05-04 23:46

Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory layers to the model and, on each training step, updati…
arXiv cs.CL TIER_1 · Aditi Raghunathan · 2026-05-04 00:02

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

Pretraining optimizers are tuned to produce the strongest possible base model, on the assumption that a stronger starting point yields a stronger model after subsequent changes like post-training and quantization. This overlooks the geometry of the base model which controls how m…

COVERAGE [4]

Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

RELATED ENTITIES

RELATED TOPICS