Composer 2.5 on Kimi K2.5, the text feedback RL bit is the interesting part
Cursor has released Composer 2.5, which is powered by Kimi K2.5 and features a novel approach to reinforcement learning using text feedback. This method aims to pinpoint and correct errors at their exact location within an agent's execution, rather than solely evaluating the final outcome. The training process involves synthetic tasks like restoring deleted functions and includes observations on potential reward hacking, highlighting the need for external verification of agent actions. AI
IMPACT Introduces a new training methodology for AI agents that focuses on localized error correction, potentially improving agent reliability.