Gemma 3 1B (llama.cpp)

Google

Production AI assistant via llama.cpp: 50-100x faster than Ollama with 114ms first token latency. Optimized for real-time chat interactions.

Text Generation Local Gemma Family v1B
Parameters
1B
params
Context Window
4K
tokens
Max Output
-
tokens
Input Price
-
per 1M tokens
Output Price
-
per 1M tokens

Capabilities

👁️
Vision
Function Calling
📋
JSON Mode
🌊
Streaming
💬
System Prompt
🖥️
Code Execution
🔍
Web Search
🔌
MCP Support

Local Model Specs

Quantization
Q4_K
Architecture
Gemma
Runtime
llama.cpp
VRAM Usage
0.23 GB
Disk Size
0.78 GB

Details

Release Date
February 21, 2024
Knowledge Cutoff
September 1, 2024
Source
Local
License
Open Source
Model ID
gemma3-1b-llama-cpp
Last updated: November 15, 2025