Coding agents break when models are "almost" bug-free. But almost valid JSON is just not the same valid JSON. Fun piece here from @akshay_pachaar shows why SFT
Fireworks AI has highlighted a critical issue with coding agents that rely on models producing "almost" bug-free output. The problem arises because even minor deviations from valid JSON format can cause agents to fail. The company's research, led by Akshay Pachaar, demonstrates that standard supervised fine-tuning (SFT) is insufficient to address this, proposing instead a method called GRPO (presumably a form of reinforcement learning) that directly trains models for correctness. AI
IMPACT Highlights a key challenge in reliable agentic systems, suggesting new training methods are needed for robust AI code generation.