🧠 AI & LLMs beginner

Token

The basic unit of text that LLMs process - typically a word, subword, or character.

views

Tokens are how LLMs see text. Tokenization breaks text into pieces the model can process. Common tokenizers (like BPE - Byte Pair Encoding) create vocabularies of 30K-100K tokens including whole words, subwords, and characters. Hello world might be 2 tokens, but tokenization might be 3+ tokens. Token counts matter because: they determine context window usage, API pricing is per-token, and generation speed is tokens-per-second. Different models use different tokenizers - the same text may have different token counts across models. Understanding tokenization helps optimize prompts and estimate costs.

</> Related Terms

Context Window

The maximum amount of text (measured in tokens) that an LLM can process in a single interaction.

LLM (Large Language Model)

AI models trained on massive text datasets to understand and generate human-like text.

Transformer

Neural network architecture using self-attention mechanisms, the foundation of modern LLMs like GPT and Claude.

[] More in AI & LLMs

→ AI Visual Process → Cognitive Orchestration Engine → Context Window → Fine-tuning → Gating Network

← cd ../glossary