English(EN) Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

新方法恢复LLM微调过程中丢失的长上下文记忆

作者 PulseAugur 编辑部 · [4 个来源] · 2026-06-09 00:00

研究人员发现，思维链（CoT）微调虽然提高了推理能力，但会显著降低混合线性注意力模型中的长上下文记忆。这个问题被称为“注意力遗忘”，会导致在“海量信息找针”（Needle-In-A-Haystack）等任务上的性能下降。一种名为QK-Restore的无训练新方法被提出，通过恢复微调前检查点的特定查询-键投影权重来解决此问题，成功恢复了长上下文能力，而没有牺牲推理性能。 AI

影响解决了LLM微调中的一个关键问题，有望为高级推理任务带来更强大的长上下文能力。

排序理由该集群包含一篇研究论文，详细介绍了一种解决LLM微调中特定问题的新方法。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。我们如何撰写摘要 →

报道来源 [4]

arXiv cs.CL TIER_1 English(EN) · Xinyu Zhou, Boyu Zhu, Yi Xu, Zhiwei Li, Yingfa Chen, Huiming Wang, Zhijiang Guo · 2026-06-10 04:00

混合式大模型中的注意力遗忘：当CoT微调破坏长程记忆，以及如何修复

arXiv:2606.11052v1 Announce Type: new Abstract: Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including Hy…
arXiv cs.CL TIER_1 English(EN) · Zhijiang Guo · 2026-06-09 16:17

混合式大模型中的注意力遗忘：当CoT微调破坏长程记忆，以及如何修复

Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including HypeNet and Jet-Nemotron, retrieval performance on…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-09 00:00

混合式大语言模型中的注意力遗忘：当思维链微调破坏长程记忆，以及如何修复它

Chain-of-thought supervised fine-tuning degrades long-context recall in hybrid linear-attention models by biasing attention gradients toward short-range patterns, but a training-free method called QK-Restore can restore long-context capabilities by reverting query-key projections…
r/MachineLearning TIER_1 English(EN) · /u/Level_Frosting_7950 · 2026-06-10 22:49

Pyrecall：检测 LLM 微调期间灾难性遗忘的开源工具[P]

<div class="md"><p>Surprised there's no real tooling for this given how much research exists on continual learning. </p> <p>Built pyrecall to fill the gap. Snapshots skill scores before/after fine-tuning, flags regressions, rolls back LoRA adapters by name. </p> <p…

报道来源 [4]

混合式大模型中的注意力遗忘：当CoT微调破坏长程记忆，以及如何修复

混合式大模型中的注意力遗忘：当CoT微调破坏长程记忆，以及如何修复

混合式大语言模型中的注意力遗忘：当思维链微调破坏长程记忆，以及如何修复它

Pyrecall：检测 LLM 微调期间灾难性遗忘的开源工具[P]

相关实体

相关话题