Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning
Researchers have developed a new training framework called ProxyCoT to improve the long-context reasoning abilities of large language models. This method transfers reasoning capabilities from shorter "proxy" contexts to full, extended contexts. By first generating high-quality reasoning traces on proxy contexts and then fine-tuning on full contexts, ProxyCoT has demonstrated consistent performance improvements over existing baselines with lower computational costs. The models trained using this approach also show better generalization to out-of-domain tasks. AI
IMPACT Enhances LLM performance on complex, long-context tasks, potentially improving applications requiring deep understanding of extensive data.