Large Language Models
Scaled Dot-Product Attention
The attention computation using dot product of queries and keys, scaled by dimension to stabilize gradients.
This concept is essential for understanding large language models and forms a key part of modern AI systems.
Related Concepts
- Attention Mechanism
- Transformer
- Query-Key-Value
Tags
large-language-models attention-mechanism transformer query-key-value
Related Terms
Attention Mechanism
A technique that allows neural networks to focus on relevant parts of the input when producing each output, assigning different weights to different input elements.
Query-Key-Value
The three learned projections in attention mechanisms used to compute attention weights and outputs.
Transformer
A neural network architecture introduced in 'Attention is All You Need' (2017) that relies entirely on self-attention mechanisms, becoming the foundation for modern LLMs.