AI summarized from verified sources
OpenAI Releases MRC Networking Protocol for AI Training
Improves reliability and speed in large AI training, maximizing GPU efficiency.
SOURCE CHECK
2 sources
Sources
Key Points
- 1Adaptive packet spraying cuts congestion
- 2SRv6 bypasses failures in microseconds
- 32-tier switches connect 100k+ GPUs
- 4Deployed in OpenAI's supercomputers
OpenAI, with AMD, Broadcom, Intel, Microsoft, and NVIDIA, released MRC protocol. It reduces congestion and boosts fault tolerance in 100k+ GPU clusters. Developers can cut GPU waste in large trainings, lowering costs. Open spec available now via OCP.
What changed
OpenAI, with AMD, Broadcom, Intel, Microsoft, and NVIDIA, released MRC protocol. It reduces congestion and boosts fault tolerance in 100k+ GPU clusters. Developers can cut GPU waste in large trainings, lowering costs. Open spec available now via OCP.
Why it matters
Improves reliability and speed in large AI training, maximizing GPU efficiency.
What to watch
Improves reliability and speed in large AI training, maximizing GPU efficiency. Key checks: Adaptive packet spraying cuts congestion / SRv6 bypasses failures in microseconds / 2-tier switches connect 100k+ GPUs.
Briefs that include this news
Use daily, weekly, and monthly briefs to understand the surrounding context.