Training & Optimization
AdamW
Adam with decoupled weight decay, providing better regularization and often superior performance.
This concept is essential for understanding training & optimization and forms a key part of modern AI systems.
Related Concepts
- Adam Optimizer
- Weight Decay
- Optimizer
Tags
training-optimization adam-optimizer weight-decay optimizer
Related Terms
Adam Optimizer
An adaptive learning rate optimization algorithm combining momentum and RMSprop, widely used for training neural networks.
Weight Decay
A regularization technique that shrinks weights toward zero during optimization. Equivalent to L2 regularization in standard SGD, but differs when using adaptive optimizers like Adam.