Large Language Models

Scaled Dot-Product Attention

The attention computation using dot product of queries and keys, scaled by dimension to stabilize gradients.

This concept is essential for understanding large language models and forms a key part of modern AI systems.

  • Attention Mechanism
  • Transformer
  • Query-Key-Value

Tags

large-language-models attention-mechanism transformer query-key-value

Related Terms

Added: November 18, 2025