AI summarized from verified sources
Measure how well AI agents handle ambiguous biology research judgments
Delegate biology data analysis judgments to AI, improving research efficiency.
SOURCE CHECK
1 sources
Sources
Key Points
- 1129 research-level benchmark questions
- 2Synthetic data for rigorous evaluation
- 3GPT-5.6 Sol reaches 31.5%
OpenAI introduced GeneBench-Pro with 129 problems across genomics and clinical genetics. It evaluates AI agents on data exploration, analysis path selection and judgment calls. GPT-5.6 Sol achieves up to 31.5% pass rate, assisting tasks that take human experts 20-40 hours for just a few dollars.
What happened
OpenAI announced GeneBench-Pro on June 30. It benchmarks AI judgment and iterative analysis in computational biology.
Impact
Advances practical AI agent use, making analysis support more accessible for researchers.
What changed
OpenAI introduced GeneBench-Pro with 129 problems across genomics and clinical genetics. It evaluates AI agents on data exploration, analysis path selection and judgment calls. GPT-5.6 Sol achieves up to 31.5% pass rate, assisting tasks that take human experts 20-40 hours for just a few dollars.