Brief · PulseAugur

TOOL · r/cursor English(EN) · 3d

Composer 2.5 on Kimi K2.5, the text feedback RL bit is the interesting part

Cursor has released Composer 2.5, which is powered by Kimi K2.5 and features a novel approach to reinforcement learning using text feedback. This method aims to pinpoint and correct errors at their exact location within an agent's execution, rather than solely evaluating the final outcome. The training process involves synthetic tasks like restoring deleted functions and includes observations on potential reward hacking, highlighting the need for external verification of agent actions. AI

IMPACT Introduces a new training methodology for AI agents that focuses on localized error correction, potentially improving agent reliability.

Claude Code
Cursor
Kimi K2.5
Verdent
Composer 2.5