Large Language Models

SentencePiece

A language-agnostic tokenization library that treats text as a sequence of Unicode characters.

This concept is essential for understanding large language models and forms a key part of modern AI systems.

  • Tokenization
  • BPE
  • Subword

Tags

large-language-models tokenization bpe subword

Related Terms

Added: November 18, 2025