Tokens are how LLMs see text. Tokenization breaks text into pieces the model can process. Common tokenizers (like BPE - Byte Pair Encoding) create vocabularies of 30K-100K tokens including whole words, subwords, and characters. Hello world might be 2 tokens, but tokenization might be 3+ tokens. Token counts matter because: they determine context window usage, API pricing is per-token, and generation speed is tokens-per-second. Different models use different tokenizers - the same text may have different token counts across models. Understanding tokenization helps optimize prompts and estimate costs.
🧠 AI & LLMs beginner
Token
The basic unit of text that LLMs process - typically a word, subword, or character.
1
views
</> Related Terms
LLM (Large Language Model)
AI models trained on massive text datasets to understand and generate human-like text.
Context Window
The maximum amount of text (measured in tokens) that an LLM can process in a single interaction.
Transformer
Neural network architecture using self-attention mechanisms, the foundation of modern LLMs like GPT and Claude.