Google DeepMind expands AI safety partnership with SingaporeAnthropic finds over 10,000 vulnerabilities with Project GlasswingSynthID expands to Google Search and ChromeAnthropic updates vuln disclosure dashboardGoal mode now available across all Codex platformsCodex Thursday adds remote Mac controlAnthropic shares early Glasswing resultsAnthropic publishes early Project Glasswing resultsReleases new science-focused AI skills toolGemini 3.5 Flash released with enhanced research toolsGoogle ships ADK for Android/Kotlin v0.1.0Google launches ADK for Kotlin and Android 0.1.0Google expands Gemini for Home for developersGemini 3.5 Flash officially launchedAI solves long-standing open math problem for first timeGoogle announces Gemini Omni for video creationUse multiple agents with Gemini OmniOpenAI Introduces Guaranteed Capacity for Long-Term ComputeGemini for Science assists with research tasksSynthID watermark and verification tool added to AI imagesGoogle DeepMind expands AI safety partnership with SingaporeAnthropic finds over 10,000 vulnerabilities with Project GlasswingSynthID expands to Google Search and ChromeAnthropic updates vuln disclosure dashboardGoal mode now available across all Codex platformsCodex Thursday adds remote Mac controlAnthropic shares early Glasswing resultsAnthropic publishes early Project Glasswing resultsReleases new science-focused AI skills toolGemini 3.5 Flash released with enhanced research toolsGoogle ships ADK for Android/Kotlin v0.1.0Google launches ADK for Kotlin and Android 0.1.0Google expands Gemini for Home for developersGemini 3.5 Flash officially launchedAI solves long-standing open math problem for first timeGoogle announces Gemini Omni for video creationUse multiple agents with Gemini OmniOpenAI Introduces Guaranteed Capacity for Long-Term ComputeGemini for Science assists with research tasksSynthID watermark and verification tool added to AI images
🔒 公式発表のみ掲載。噂・リーク・情報商材は載せません。
← Back to top
OpenAI20:07PolicyOfficial Blog

OpenAI Releases CoT Controllability Eval Suite

Simplifies safe AI ops by monitoring internal reasoning easily

Key Points

  • 1CoT-Control: Open-source eval >13k tasks
  • 2Controllability 0.1-15.4%, monitoring viable
  • 3Better in larger models, worse in long reasoning

OpenAI launched CoT-Control eval suite across 13+ benchmarks testing 13 models' ability to obscure reasoning chains. Larger models show better controllability, but long reasoning reduces it, making CoT monitoring reliable for safety. Businesses can trust it for agent oversight.

h
hayami

Stay on top of OpenAI, Google & Anthropic updates. An AI digest for business professionals.

Source Policy

We use only official sources. Each article links to the original announcement so you can verify it yourself.

© 2026 hayami. All rights reserved.