PulseAugur
实时 07:29:22
English(EN) Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

CoT微调降低LLM长上下文回忆能力;QK-Restore修复该问题

研究人员发现,旨在提升推理能力的思维链(CoT)微调,会无意中损害混合线性注意力模型的长上下文回忆能力。这种退化在HypeNet和Jet-Nemotron等模型中尤为明显,微调后检索准确率急剧下降。为解决此问题,研究人员开发了一种新的无需训练的方法,称为QK-Restore。该方法选择性地将查询-键投影参数恢复到微调前的状态,在不影响推理性能的情况下有效恢复长上下文回忆能力。 AI

影响 这项研究为在以推理为重点的微调后保持LLM的长上下文能力提供了关键的解决方案,有望提高它们在复杂、长文档任务中的效用。

排序理由 学术论文,详细介绍了一种解决特定LLM训练问题的新方法。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. arXiv cs.CL TIER_1 English(EN) · Xinyu Zhou, Boyu Zhu, Yi Xu, Zhiwei Li, Yingfa Chen, Huiming Wang, Zhijiang Guo ·

    Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

    arXiv:2606.11052v1 Announce Type: new Abstract: Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including Hy…

  2. arXiv cs.CL TIER_1 English(EN) · Zhijiang Guo ·

    Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

    Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including HypeNet and Jet-Nemotron, retrieval performance on…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

    Chain-of-thought supervised fine-tuning degrades long-context recall in hybrid linear-attention models by biasing attention gradients toward short-range patterns, but a training-free method called QK-Restore can restore long-context capabilities by reverting query-key projections…