新的LLM遗忘方法针对次要组件以提高安全性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-12 07:43

研究人员发现当前大型语言模型（LLM）遗忘技术存在一个关键漏洞，模型可以通过再学习攻击迅速恢复被遗忘的信息。这种脆弱性源于现有方法主要改变模型表示的主导成分，而将次要成分保留下来，使其更难被逆转。为此，提出了一种名为Minor Component Unlearning（MCU）的新方法，该方法侧重于修改这些稳健的次要成分，以增强对再学习攻击的抵抗力，并在实验中显示出显著的改进。 AI

影响通过使遗忘后更难恢复敏感数据来增强LLM安全性，这对于隐私和版权至关重要。

排序理由提出LLM遗忘新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Guanhua Chen · 2026-05-12 07:43

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

Large language model (LLM) unlearning aims to remove specific data influences from pre-trained model without costly retraining, addressing privacy, copyright, and safety concerns. However, recent studies reveal a critical vulnerability: unlearned models rapidly recover "forgotten…

报道来源 [1]

Robust LLM Unlearning Against Relearning Attacks: The Minor Components in Representations Matter

相关实体

相关话题