Large Language Models
SentencePiece
A language-agnostic tokenization library that treats text as a sequence of Unicode characters.
This concept is essential for understanding large language models and forms a key part of modern AI systems.
Related Concepts
- Tokenization
- BPE
- Subword
Tags
large-language-models tokenization bpe subword