Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It
Researchers have identified that Chain-of-Thought (CoT) fine-tuning, while improving reasoning, significantly degrades long-context recall in hybrid linear-attention models. This issue, termed "attention amnesia," causes performance drops on tasks like Needle-In-A-Haystack. A new training-free method called QK-Restore has been proposed to fix this by restoring specific query-key projection weights from a pre-fine-tuning checkpoint, successfully recovering long-context capabilities without sacrificing reasoning performance. AI
IMPACT Addresses a critical issue in LLM fine-tuning, potentially enabling more robust long-context capabilities for advanced reasoning tasks.