Brief · PulseAugur

RESEARCH · Hugging Face Daily Papers English(EN) · 1d · [3 sources]

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Researchers have identified a significant issue where Chain-of-Thought (CoT) fine-tuning, intended to boost reasoning, inadvertently harms the long-context recall capabilities of hybrid linear-attention models. This degradation is particularly pronounced in models like HypeNet and Jet-Nemotron, where retrieval accuracy plummets after fine-tuning. To address this, a novel training-free method called QK-Restore has been developed, which selectively reverts the query-key projection parameters to their pre-fine-tuning state, effectively restoring long-context recall without compromising reasoning performance. AI

IMPACT This research offers a crucial fix for maintaining long-context capabilities in LLMs after reasoning-focused fine-tuning, potentially improving their utility in complex, long-document tasks.

Chain-of-Thought (CoT)
arXiv
Jet-Nemotron
QK-Restore
Hugging Face
Chain-of-Thought (CoT) fine-tuning