Skip to content

Tokenization

The process of breaking text into smaller units (tokens) that a language model can process. Tokenizers may split text into words, subwords, or characters. Byte-pair encoding (BPE) is a widely used tokenization scheme.

Related terms

TokenLarge Language Model (LLM)
← Back to glossary