🧠 AI & LLMs intermediate

Transformer

Neural network architecture using self-attention mechanisms, the foundation of modern LLMs like GPT and Claude.

Transformers revolutionized natural language processing when introduced in the 2017 paper "Attention Is All You Need." Unlike recurrent neural networks (RNNs), transformers process entire sequences in parallel using self-attention mechanisms, enabling them to capture long-range dependencies efficiently. Key components include: multi-head attention (allowing the model to focus on different parts of input simultaneously), positional encoding (since transformers have no inherent sequence order), and layer normalization. The architecture consists of encoder and decoder stacks, though many modern models use only one (BERT uses encoder-only, GPT uses decoder-only). Transformers enabled the scaling laws that led to today's powerful LLMs.