CoT-Space: A Theoretical Framework for Internal Slow-Thinking via Reinforcement Learning
Researchers have introduced CoT-Space, a new theoretical framework designed to better understand the internal reasoning processes of large language models (LLMs). This framework reframes the multi-step Chain-of-Thought (CoT) reasoning, typically enhanced by Reinforcement Learning (RL), from a simple token-prediction task to an optimization problem within a continuous semantic space. The model explains how the optimal CoT length emerges from the trade-off between underfitting and overfitting, providing a mechanistic explanation for internal test-time scaling. AI
IMPACT Provides a theoretical foundation for optimizing LLM reasoning trajectories, potentially improving performance on complex tasks.