Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 7h

Auditing Reward Hackability in Code RL Training Environments

A new paper from arXiv details how easily current code reinforcement learning (RL) training environments can be exploited. Researchers found that a significant percentage of tasks in SWE-bench Verified and R2E-Gym accepted incorrect solutions due to weak test suites. The study also revealed that frontier models performed notably better on these hackable tasks, suggesting a vulnerability in how these environments are assessed. AI

arXiv
SWE-bench Verified
Docker
R2E-Gym