OpenAI13:59Feature UpdatesOfficial Blog
OpenAI Releases MRC Networking Protocol for AI Training
Improves reliability and speed in large AI training, maximizing GPU efficiency.
Key Points
- 1Adaptive packet spraying cuts congestion
- 2SRv6 bypasses failures in microseconds
- 32-tier switches connect 100k+ GPUs
- 4Deployed in OpenAI's supercomputers
OpenAI, with AMD, Broadcom, Intel, Microsoft, and NVIDIA, released MRC protocol. It reduces congestion and boosts fault tolerance in 100k+ GPU clusters. Developers can cut GPU waste in large trainings, lowering costs. Open spec available now via OCP.