🧠 AI & LLMs intermediate

Transformer

Neural network architecture using self-attention mechanisms, the foundation of modern LLMs like GPT and Claude.

views

Transformers revolutionized natural language processing when introduced in the 2017 paper "Attention Is All You Need." Unlike recurrent neural networks (RNNs), transformers process entire sequences in parallel using self-attention mechanisms, enabling them to capture long-range dependencies efficiently. Key components include: multi-head attention (allowing the model to focus on different parts of input simultaneously), positional encoding (since transformers have no inherent sequence order), and layer normalization. The architecture consists of encoder and decoder stacks, though many modern models use only one (BERT uses encoder-only, GPT uses decoder-only). Transformers enabled the scaling laws that led to today's powerful LLMs.

</> Related Terms

Context Window

The maximum amount of text (measured in tokens) that an LLM can process in a single interaction.

Token

The basic unit of text that LLMs process - typically a word, subword, or character.

LLM (Large Language Model)

AI models trained on massive text datasets to understand and generate human-like text.

[] More in AI & LLMs

→ AI Visual Process → Cognitive Orchestration Engine → Context Window → Fine-tuning → Gating Network

← cd ../glossary