System for scaling LLM training to over 10,000 GPUs.

Paper

arXiv: 2402.15627

infrastructuretrainingscaling