Gemma 3 1B (llama.cpp)
Production AI assistant via llama.cpp: 50-100x faster than Ollama with 114ms first token latency. Optimized for real-time chat interactions.
Text Generation Local Gemma Family v1B
Parameters
1B
params
Context Window
4K
tokens
Max Output
-
tokens
Input Price
-
per 1M tokens
Output Price
-
per 1M tokens
Gemma Family 7 models
The full Gemma line by generation — pricing and capabilities vary across the family.
Google
FunctionGemma
Native function calling for on-device agents. Routes complex tasks to larger models. Optimized for edge deployment.
8K
context
2 9B
General purpose, balanced
8K
context
2 2B
Edge devices, fast inference
8K
context
2 27B
High quality generation
8K
context
3 1B (llama.cpp) Current
Fast AI assistant for chat, code generation, and reasoning tasks
4K
context
Capabilities
👁️
Vision
⚡
Function Calling
📋
JSON Mode
🌊
Streaming
💬
System Prompt
🖥️
Code Execution
🔍
Web Search
🔌
MCP Support
Local Model Specs
Quantization
Q4_K
Architecture
Gemma
Runtime
llama.cpp
VRAM Usage
0.23 GB
Disk Size
0.78 GB
Details
- Release Date
- February 21, 2024
- Knowledge Cutoff
- September 1, 2024
- Source
- Local
- License
- Open Source
- Model ID
- gemma3-1b-llama-cpp
Last updated: November 15, 2025