Scaled Dot-Product Attention

Large Language Models

The attention computation using dot product of queries and keys, scaled by dimension to stabilize gradients.

This concept is essential for understanding large language models and forms a key part of modern AI systems.

Attention Mechanism
Transformer
Query-Key-Value

Related Terms

Attention Mechanism

A technique that allows neural networks to focus on relevant parts of the input when producing each output, assigning different weights to different input elements.

Query-Key-Value

The three learned projections in attention mechanisms used to compute attention weights and outputs.

Transformer

A neural network architecture introduced in 'Attention is All You Need' (2017) that relies entirely on self-attention mechanisms, becoming the foundation for modern LLMs.

← Back to All Terms