Translate conversations more naturally while you talk Sonnet 5 becomes default, enabling autonomous handling of complex tasks Easily automate multi-step daily tasks at lower cost Make Claude easier to deploy through AWS Fable 5 access is returning to users Keep research tools in one place and move faster Measure how well AI agents handle ambiguous biology research judgments Choose GPT-5.6 Sol for strong performance on security tasks GPT-5.6 Sol boosts efficiency for long-horizon security tasks Unifies Gemini’s dev entry point for faster prototyping Tag Claude in Slack to delegate tasks with your whole team Slack users can hand work off to Claude more easily Confidential AI gets stronger for sensitive workloads Gemini API key management is moving to safer auth keys Google Home Speaker makes home control feel natural Claude expands more easily into Korean businesses and research Anthropic expands Claude adoption and research in Korea Domain knowledge helps intermediate users succeed with Claude Code Easier to predict model behavior using real deployment data beforehand Google makes data analysis easier through conversation Translate conversations more naturally while you talk Sonnet 5 becomes default, enabling autonomous handling of complex tasks Easily automate multi-step daily tasks at lower cost Make Claude easier to deploy through AWS Fable 5 access is returning to users Keep research tools in one place and move faster Measure how well AI agents handle ambiguous biology research judgments Choose GPT-5.6 Sol for strong performance on security tasks GPT-5.6 Sol boosts efficiency for long-horizon security tasks Unifies Gemini’s dev entry point for faster prototyping Tag Claude in Slack to delegate tasks with your whole team Slack users can hand work off to Claude more easily Confidential AI gets stronger for sensitive workloads Gemini API key management is moving to safer auth keys Google Home Speaker makes home control feel natural Claude expands more easily into Korean businesses and research Anthropic expands Claude adoption and research in Korea Domain knowledge helps intermediate users succeed with Claude Code Easier to predict model behavior using real deployment data beforehand Google makes data analysis easier through conversation

Official sources only. Rumors, leaks, and get-rich schemes are excluded.

← Back to top

AI BriefingOpenAIFeature Updates00:00

AI summarized from verified sources

Measure how well AI agents handle ambiguous biology research judgments

Delegate biology data analysis judgments to AI, improving research efficiency.

SOURCE CHECK

1 sources

VERIFIED

Sources

Primary / openai.com

Official Blog

Key Points

1129 research-level benchmark questions
2Synthetic data for rigorous evaluation
3GPT-5.6 Sol reaches 31.5%

OpenAI introduced GeneBench-Pro with 129 problems across genomics and clinical genetics. It evaluates AI agents on data exploration, analysis path selection and judgment calls. GPT-5.6 Sol achieves up to 31.5% pass rate, assisting tasks that take human experts 20-40 hours for just a few dollars.

What happened

OpenAI announced GeneBench-Pro on June 30. It benchmarks AI judgment and iterative analysis in computational biology.

Impact

Advances practical AI agent use, making analysis support more accessible for researchers.

What changed

OpenAI introduced GeneBench-Pro with 129 problems across genomics and clinical genetics. It evaluates AI agents on data exploration, analysis path selection and judgment calls. GPT-5.6 Sol achieves up to 31.5% pass rate, assisting tasks that take human experts 20-40 hours for just a few dollars.