New method restores long-context recall lost during LLM fine-tuning

By PulseAugur Editorial · [4 sources] · 2026-06-09 00:00

Researchers have identified that Chain-of-Thought (CoT) fine-tuning, while improving reasoning, significantly degrades long-context recall in hybrid linear-attention models. This issue, termed "attention amnesia," causes performance drops on tasks like Needle-In-A-Haystack. A new training-free method called QK-Restore has been proposed to fix this by restoring specific query-key projection weights from a pre-fine-tuning checkpoint, successfully recovering long-context capabilities without sacrificing reasoning performance. AI

IMPACT Addresses a critical issue in LLM fine-tuning, potentially enabling more robust long-context capabilities for advanced reasoning tasks.

RANK_REASON The cluster contains a research paper detailing a new method to address a specific problem in LLM fine-tuning.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

New method restores long-context recall lost during LLM fine-tuning

COVERAGE [4]

arXiv cs.CL TIER_1 English(EN) · Xinyu Zhou, Boyu Zhu, Yi Xu, Zhiwei Li, Yingfa Chen, Huiming Wang, Zhijiang Guo · 2026-06-10 04:00

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

arXiv:2606.11052v1 Announce Type: new Abstract: Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including Hy…
arXiv cs.CL TIER_1 English(EN) · Zhijiang Guo · 2026-06-09 16:17

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Chain-of-thought (CoT) supervised fine-tuning (SFT) is widely adopted to improve reasoning ability, yet we find that it systematically degrades long-context recall in hybrid linear-attention models. Across architectures including HypeNet and Jet-Nemotron, retrieval performance on…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-09 00:00

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Chain-of-thought supervised fine-tuning degrades long-context recall in hybrid linear-attention models by biasing attention gradients toward short-range patterns, but a training-free method called QK-Restore can restore long-context capabilities by reverting query-key projections…
r/MachineLearning TIER_1 English(EN) · /u/Level_Frosting_7950 · 2026-06-10 22:49

Pyrecall open source tool for detecting catastrophic forgetting during LLM fine-tuning[P]

<div class="md"><p>Surprised there's no real tooling for this given how much research exists on continual learning. </p> <p>Built pyrecall to fill the gap. Snapshots skill scores before/after fine-tuning, flags regressions, rolls back LoRA adapters by name. </p> <p…

COVERAGE [4]

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

Pyrecall open source tool for detecting catastrophic forgetting during LLM fine-tuning[P]

RELATED ENTITIES

RELATED TOPICS