OpenAI18:46Press ReleasesOfficial Blog
OpenAI Releases EVMbench to Evaluate AI Agent Vulnerability Detection
Automates contract audits with AI to strengthen asset protection.
Key Points
- 1Benchmarks 120 vulnerabilities evaluating detection, repair, and exploitation
- 2GPT-5.3-Codex achieved 72.2% in exploit detection, a major improvement
- 3Promotes AI usage in security audits
- 4Available as a practical tool for developers
OpenAI and Paradigm jointly launched EVMbench, benchmarking AI agents' abilities to detect, exploit, and fix smart contract vulnerabilities. GPT-5.3-Codex scored over 72% in exploit mode, advancing blockchain security. The tool supports developers with practical AI-powered auditing capabilities.