Model Architectures

The Transformer architecture revolutionized NLP and became the foundation for virtually all modern large language models. It replaced recurrent architectures with self-attention, enabling parallel processing and better capture of long-range dependencies.

Key Innovations

  • Self-Attention: Allows each position to attend to all positions
  • Positional Encoding: Injects sequence order information
  • Multi-Head Attention: Learns multiple attention patterns simultaneously
  • Feed-Forward Networks: Processes attended information

Impact

Transformers power GPT, BERT, T5, Claude, and most state-of-the-art language models. They’ve also been adapted for computer vision (ViT), speech, and multimodal tasks.

Tags

architecture llm attention breakthrough

Related Terms

Added: January 15, 2025