English(EN) Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

SMF和SAM等新方法减少了LLM的灾难性遗忘

作者 PulseAugur 编辑部 · [4 个来源] · 2026-05-04 00:02

两篇新研究论文探讨了在微调过程中减轻语言模型灾难性遗忘的方法。其中一篇论文介绍了稀疏记忆微调（SMF），该方法增加了记忆层并仅更新访问量大的行，在医学考试任务上表现出改进的性能，同时通用能力损失最小。另一篇论文研究了锐度感知最小化（SAM）和其他预训练优化技术，证明偏向更平坦的最小值可以显著减少各种模型大小和训练后场景下的遗忘。 AI

影响这些技术可能带来更强大、更适应性强的语言模型，在学习新任务的同时保留通用知识。

排序理由两篇arXiv论文提出了减轻语言模型灾难性遗忘的新颖方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

arXiv cs.LG TIER_1 English(EN) · Prakhar Gupta, Garv Shah, Satyam Goyal, Anirudh Kanchi · 2026-05-06 04:00

稀疏记忆微调作为LoRA和全微调的低遗忘替代方案

arXiv:2605.03229v1 Announce Type: cross Abstract: Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory laye…
arXiv cs.CL TIER_1 English(EN) · Ishaan Watts, Catherine Li, Sachin Goyal, Jacob Mitchell Springer, Aditi Raghunathan · 2026-05-05 04:00

锐度感知预训练减轻灾难性遗忘

arXiv:2605.02105v1 Announce Type: cross Abstract: Pretraining optimizers are tuned to produce the strongest possible base model, on the assumption that a stronger starting point yields a stronger model after subsequent changes like post-training and quantization. This overlooks t…
arXiv cs.CL TIER_1 English(EN) · Anirudh Kanchi · 2026-05-04 23:46

稀疏记忆微调作为LoRA和全微调的低遗忘替代方案

Adapting a pretrained language model to a new task often hurts the general capabilities it already had, a problem known as catastrophic forgetting. Sparse Memory Finetuning (SMF) tries to avoid this by adding key-value memory layers to the model and, on each training step, updati…
arXiv cs.CL TIER_1 English(EN) · Aditi Raghunathan · 2026-05-04 00:02

锐度感知预训练减轻灾难性遗忘

Pretraining optimizers are tuned to produce the strongest possible base model, on the assumption that a stronger starting point yields a stronger model after subsequent changes like post-training and quantization. This overlooks the geometry of the base model which controls how m…

报道来源 [4]

稀疏记忆微调作为LoRA和全微调的低遗忘替代方案

锐度感知预训练减轻灾难性遗忘

稀疏记忆微调作为LoRA和全微调的低遗忘替代方案

锐度感知预训练减轻灾难性遗忘

相关实体

相关话题