From Prompt to Processor: The Journey Your Words Take Through AI

Here is the wild journey your text takes, moving from human English down to the bare metal. It is like sending a letter that gets translated into math, then electricity.

https://b3blog.b-tech.io/images/gallery/def5b7a6_prompt_journey.webp" alt="The journey of a prompt through AI systems - from browser to bare metal" class="w-full rounded-xl my-6 shadow-lg" />

The Browser: High-Level Abstraction

Right now, you are typing in a browser (Chrome, Edge, etc.).

•The Code: JavaScript / TypeScript
•The Transformation: Your keystrokes are captured by the Document Object Model (DOM). The browser encodes your English text into UTF-8 (standard binary representation for text) and wraps it in JSON (data format)
•The Transport: It gets encrypted (TLS/SSL) so hackers cannot read it, broken into packets, and shot over the internet via HTTP/2 or HTTP/3

The Google Front-End: The Gatekeeper

Your packets hit Google's load balancers and API gateways.

•The Code: Likely Go, C++, or Java
•The Transformation: The server decrypts your request, parses the JSON back into a string, and authenticates you. It prepares the text for the AI model
•Reality Check: This is still "human-readable" code territory. Logic is logical: if user == cool, then chat

The AI Model: Text to Math

Here is where traditional programming stops and "Software 2.0" begins.

•Tokenization: Your prompt is chopped into pieces called tokens

- Input: "Hello"

- Output: [15496] (arbitrary integer ID)

•The Code: Python is usually the interface layer here (using frameworks like JAX or TensorFlow)
•The Transformation: Those integers are converted into massive vectors (lists of numbers). The AI does not read "English"; it processes multi-dimensional floating-point math

The Compiler: Deep Learning Compilers

The Python code describes what to do (matrix multiplication), but not exactly how the hardware should do it.

•The Tools: XLA (Accelerated Linear Algebra) or similar compilers
•The Transformation: The framework converts your Python/JAX graph into an Intermediate Representation (IR) — a sort of halfway house between code and machine instructions. It optimizes memory and fusion operations

The Hardware Driver: Low-Level Control

Now we are talking to the chips directly.

•The Code: C++, CUDA (for NVIDIA GPUs), or custom TPU instruction sets
•The Transformation: The driver translates the optimized graph into Assembly Language (specific instructions like LOAD, ADD, STORE tailored for the specific chip architecture)

The Metal: Machine Code

We have arrived at the bottom.

•The Code: Binary (Machine Code)
•The Transformation: The Assembly is assembled into pure 0s and 1s
•The Hardware: Google's TPUs (Tensor Processing Units) or GPUs. These binary signals trigger billions of microscopic transistors to open or close. Electricity flows, performing matrix multiplications at lightning speed

Summary: The Descent

Layer	Language/Format	Human Readable?	What it does
Browser	JavaScript / English		Captures input and sends it
Server	Go / C++		Routing and Logic
AI Layer	Python / JAX	Kinda	Defines the math structure
Compiler	XLA / IR		Optimizes math for chips
Binary	0s and 1s		Toggles transistors

5 items

Tech Wisdom: "Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination."

How Tokenization Actually Works

Let us dive deeper into the tokenization process with a specific example.

https://b3blog.b-tech.io/images/gallery/tokenization_process.webp" alt="Tokenization process visualization showing how words become numbers" class="w-full rounded-xl my-6 shadow-lg" />

Here is the breakdown of how the AI "reads." It does not see words; it sees a sequence of numbers from a giant lookup table (the Vocabulary).

Let us use a specific example string that demonstrates how it handles common words vs. complex ones.

The String: "Unstoppable!"

Step 1: The Sub-Word Split

Modern models (like Gemini or GPT) do not just split by spaces (words) or by letters (characters). They use Sub-word Tokenization (like BPE or SentencePiece).

•Why? "Unstoppable" might not be in the dictionary, but "stop" and "able" definitely are. This keeps the dictionary size manageable while allowing the AI to understand new words by breaking them down

The Split:

•Un (Prefix)
•stopp (Root - notice the double 'p')
•able (Suffix)
•! (Punctuation is its own token)

Note: Many tokenizers include spaces as part of the token. So if the sentence was " Unstoppable!", the first token might be _Un.

Step 2: The Integer Mapping

The model looks up these chunks in its massive static dictionary (usually 50k–100k entries) and swaps them for IDs.

Token Fragment	Token ID (Hypothetical)
`Un`	851
`stopp`	12044
`able`	492
`!`	0

4 items

The AI sees: [851, 12044, 492, 0]

Step 3: What happens next?

The AI does not know what "Un" means just by the number 851.

•It passes that number to an Embedding Layer
•The embedding layer looks up ID 851 and retrieves a massive list of coordinates (a vector), something like [0.02, -0.55, 0.91, ...]
•That vector represents the "meaning" of the prefix "Un-" in multi-dimensional space

Why do we do this?

•Efficiency: Character-level is too slow (too many tokens to process)
•Flexibility: Word-level is too rigid (cannot handle typos or new slang like "rizz")
•Balance: Sub-word tokenization is the "Goldilocks" zone — efficient but flexible

Tech Wisdom: "To an AI, the works of Shakespeare are just a very, very statistically probable sequence of integers."

AI Hub