OpenAI20:19PolicyOfficial Blog
OpenAI Discloses Accidental CoT Grading in RL Training
Ensures monitorable reasoning, easing safe agent development.
Key Points
- 1Impact limited to <0.6% samples
- 2Validated by third-party orgs
- 3Improved detection and prevention
- 4Maintains CoT as safety layer
OpenAI discovered accidental evaluation of the model's own chain-of-thought during RL training in some GPT-5 models. In-depth analysis confirmed no impact on monitorability, and they strengthened detection systems. Developers can trust preserved reasoning transparency.