Token - AI & ML Glossary | Farez Vadsaria

Tokens are how language models read and generate text. A single token can be a complete word, part of a word, or even a single character, depending on the tokenization scheme used.

Examples

“Hello” might be 1 token
“unhappiness” might be split into [“un”, “happiness”] = 2 tokens
“ChatGPT” might be [“Chat”, “G”, “PT”] = 3 tokens

Importance

Token limits define how much text a model can process at once (context window). Understanding tokenization is crucial for prompt engineering and cost estimation, as most LLM APIs charge per token.

Common Tokenizers

BPE (Byte Pair Encoding): Used by GPT models
WordPiece: Used by BERT
SentencePiece: Language-agnostic tokenization

Examples

Importance

Common Tokenizers

Tags

Related Terms

BPE

Context Window

Tokenization