Researchers have introduced OPSDL, a novel on-policy self-distillation method designed to improve the long-context capabilities of large language models. This approach utilizes the model's existing short-context proficiency to supervise its own long-context generation, providing dense, token-level feedback. Evaluations across models ranging from 7B to 32B parameters demonstrate significant and consistent improvements in handling extended contexts, outperforming existing methods like SFT and DPO in sample efficiency without compromising short-context performance. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON The submission describes a new method for improving LLM long-context capabilities published as a research paper.