Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model
Researchers have developed HIVE, a new framework designed to make reinforcement learning (RL) more efficient for training large language models in reasoning tasks. HIVE addresses the high computational cost associated with current RL methods by intelligently selecting high-utility prompts before the expensive rollout phase. The system identifies prompts at the "learning edge"—those with intermediate difficulty and high uncertainty—which shift as training progresses, thereby reducing wasted computation without sacrificing performance. AI
IMPACT HIVE's efficient prompt selection could significantly reduce the computational cost of training LLMs for reasoning tasks.