Describe goals to complete cross-app tasks automatically Start natural voice conversations anytime with GPT-Live Get better work quality at lower cost Long tasks can move from draft to presentation more easily Track the latest safety rules for bigger models Translate naturally during calls, meetings, and travel Easily automate multi-step daily tasks at lower cost Make Claude easier to deploy through AWS Claude Fable 5 is usable again after the pause Keep research tools and analysis in one place Delegate more everyday coding work to Claude Measure how well AI agents handle ambiguous biology research judgments Claude Sonnet 5 is built for heavier coding and work tasks HP partnership makes enterprise rollout easier Tag Claude in Slack to delegate tasks with your whole team Hand Slack tasks to Claude more easily Confidential AI gets stronger for sensitive workloads Gemini API key management is moving to safer auth keys Claude expands more easily into Korean businesses and research Anthropic expands Claude adoption and research in Korea Describe goals to complete cross-app tasks automatically Start natural voice conversations anytime with GPT-Live Get better work quality at lower cost Long tasks can move from draft to presentation more easily Track the latest safety rules for bigger models Translate naturally during calls, meetings, and travel Easily automate multi-step daily tasks at lower cost Make Claude easier to deploy through AWS Claude Fable 5 is usable again after the pause Keep research tools and analysis in one place Delegate more everyday coding work to Claude Measure how well AI agents handle ambiguous biology research judgments Claude Sonnet 5 is built for heavier coding and work tasks HP partnership makes enterprise rollout easier Tag Claude in Slack to delegate tasks with your whole team Hand Slack tasks to Claude more easily Confidential AI gets stronger for sensitive workloads Gemini API key management is moving to safer auth keys Claude expands more easily into Korean businesses and research Anthropic expands Claude adoption and research in Korea

Official sources only. Rumors, leaks, and get-rich schemes are excluded.

← Back to top

AI BriefingOpenAIPress Releases18:46

AI summarized from verified sources

OpenAI Releases EVMbench to Evaluate AI Agent Vulnerability Detection

Automates contract audits with AI to strengthen asset protection.

SOURCE CHECK

2 sources

VERIFIED

Sources

Primary / x.com

Official Blog

Supporting / openai.com

Official Blog

Key Points

1Benchmarks 120 vulnerabilities evaluating detection, repair, and exploitation
2GPT-5.3-Codex achieved 72.2% in exploit detection, a major improvement
3Promotes AI usage in security audits
4Available as a practical tool for developers

OpenAI and Paradigm jointly launched EVMbench, benchmarking AI agents' abilities to detect, exploit, and fix smart contract vulnerabilities. GPT-5.3-Codex scored over 72% in exploit mode, advancing blockchain security. The tool supports developers with practical AI-powered auditing capabilities.

Key point

OpenAI and Paradigm jointly launched EVMbench, benchmarking AI agents' abilities to detect, exploit, and fix smart contract vulnerabilities. GPT-5.3-Codex scored over 72% in exploit mode, advancing blockchain security. The tool supports developers with practical AI-powered auditing capabilities.

Impact

Automates contract audits with AI to strengthen asset protection. Key checks: Benchmarks 120 vulnerabilities evaluating detection, repair, and exploitation / GPT-5.3-Codex achieved 72.2% in exploit detection, a major improvement / Promotes AI usage in security audits.