Kyle Corbitt, co-founder of OpenPipe (now acquired by CoreWeave), discussed the shift from supervised fine-tuning to reinforcement learning for AI agents. He highlighted that reliability issues, not capability limits, prevent most AI projects from reaching production. Corbitt introduced RULER, a method using LLMs for relative reward ranking to simplify RL training, and emphasized that continuous learning from real-world data is key to agent reliability. The acquisition by CoreWeave is expected to accelerate the development of their serverless reinforcement learning platform, aiming to unlock significant new AI inference demand. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON The item discusses new research and techniques in reinforcement learning for AI agents, including a novel reward mechanism (RULER) and challenges in agent training environments.