Training & Optimization

Sharded Data Parallelism

Distributing model states across devices to train models larger than single-device memory.

This concept is essential for understanding training & optimization and forms a key part of modern AI systems.

  • Distributed Training
  • Data Parallelism
  • ZeRO

Tags

training-optimization distributed-training data-parallelism zero

Related Terms

Added: November 18, 2025