Gemma 3 1B (llama.cpp)
Production AI assistant via llama.cpp: 50-100x faster than Ollama with 114ms first token latency. Optimized for real-time chat interactions.
Text Generation Local Gemma Family v1B
Parameters
1B
params
Context Window
4K
tokens
Max Output
-
tokens
Input Price
-
per 1M tokens
Output Price
-
per 1M tokens
Gemma Family Timeline 7 versions
Gemma 3 1B (llama.cpp) Current local
Feb 2024
Capabilities
👁️
Vision
⚡
Function Calling
📋
JSON Mode
🌊
Streaming
💬
System Prompt
🖥️
Code Execution
🔍
Web Search
🔌
MCP Support
Local Model Specs
Quantization
Q4_K
Architecture
Gemma
Runtime
llama.cpp
VRAM Usage
0.23 GB
Disk Size
0.78 GB
Details
- Release Date
- February 21, 2024
- Knowledge Cutoff
- September 1, 2024
- Source
- Local
- License
- Open Source
- Model ID
- gemma3-1b-llama-cpp
Last updated: November 15, 2025