Training & Optimization
Sharded Data Parallelism
Distributing model states across devices to train models larger than single-device memory.
This concept is essential for understanding training & optimization and forms a key part of modern AI systems.
Related Concepts
- Distributed Training
- Data Parallelism
- ZeRO
Tags
training-optimization distributed-training data-parallelism zero