Researchers have developed a new benchmark called Auditing Sabotage Bench to test the ability of AI models and humans to detect subtle sabotage in machine learning research codebases. The benchmark includes nine ML codebases with intentionally flawed variants designed to produce misleading results. When tested, even advanced models like Gemini 3.1 Pro struggled to reliably identify these sabotages, achieving only a 77% accuracy in detection and a 42% success rate in fixing them. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT This benchmark highlights potential risks of AI-driven research and the need for robust auditing tools to ensure AI safety.
RANK_REASON The cluster describes a new academic benchmark and paper released on arXiv.