Skip to content

11. Distributed Training

This chapter focuses on 3D parallelism and large-scale training system engineering.

Focus Points

  • data, tensor, and pipeline parallelism
  • overlap between communication and compute
  • training stability and throughput optimization

AI-HPC Organization · Contact: openaihpc@gmail.com