Transformer - AI & ML Glossary

The Transformer architecture revolutionized NLP and became the foundation for virtually all modern large language models. It replaced recurrent architectures with self-attention, enabling parallel processing and better capture of long-range dependencies.

Key Innovations

Self-Attention: Allows each position to attend to all positions
Positional Encoding: Injects sequence order information
Multi-Head Attention: Learns multiple attention patterns simultaneously
Feed-Forward Networks: Processes attended information

Impact

Transformers power GPT, BERT, T5, Claude, and most state-of-the-art language models. They’ve also been adapted for computer vision (ViT), speech, and multimodal tasks.

Key Innovations

Impact

Tags

Related Terms

BERT

Encoder-Decoder

GPT

Multi-Head Attention

Self-Attention