Large Language Models
WordPiece
A subword tokenization algorithm used by BERT, similar to BPE but with different merging criteria.
This concept is essential for understanding large language models and forms a key part of modern AI systems.
Related Concepts
- Tokenization
- BPE
- BERT
Tags
large-language-models tokenization bpe bert
Related Terms
BERT
Bidirectional Encoder Representations from Transformers - a model that understands context by looking at text from both directions.
BPE
Byte Pair Encoding - a subword tokenization algorithm that iteratively merges frequent character pairs to create a vocabulary.
Tokenization
The process of breaking text into smaller units (tokens) that language models can process, using algorithms like BPE or WordPiece.