OpenAI has introduced EVMbench, a new benchmark designed to evaluate the capabilities of AI agents in detecting, patching, and exploiting vulnerabilities within smart contracts. This benchmark utilizes a curated set of 117 vulnerabilities from audits and aims to improve the security of blockchain environments, which handle over $100 billion in assets. Early results show that GPT-5.3-Codex achieved a 71.0% score in exploit mode, a significant improvement over previous models, though detection and patching capabilities still require further development. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON OpenAI released a new benchmark for evaluating AI agents on smart contract security, which is a research-oriented release.