Google15:05Feature UpdatesOfficial Blog
Google DeepMind Announces Decoupled DiLoCo for Resilient Training
Boosts training reliability, cuts development costs.
Key Points
- 1No stops on chip failures
- 2Low-bandwidth, multi-region support
- 3Evolves Pathways and DiLoCo
- 4Proven on Gemma 12B
Google DeepMind launched Decoupled DiLoCo for continuous AI training across data centers despite failures. Trained 12B Gemma on low-bandwidth, mixed hardware. Overcomes geography and capacity limits.