Gemma 3 1B (llama.cpp)

Google

Production AI assistant via llama.cpp: 50-100x faster than Ollama with 114ms first token latency. Optimized for real-time chat interactions.

Text Generation Local Gemma Family v1B

Parameters

1B

params

Context Window

4K

tokens

Max Output

-

tokens

Input Price

-

per 1M tokens

Output Price

-

per 1M tokens

Gemma Family Timeline 7 versions

FunctionGemma local

Gemma 3 12B local

Gemma 3 27B Latest local

Gemma 2 9B local

Gemma 2 2B local

Gemma 2 27B local

Gemma 3 1B (llama.cpp) Current local

Feb 2024

Capabilities

👁️

Vision

⚡

Function Calling

📋

JSON Mode

🌊

Streaming

💬

System Prompt

🖥️

Code Execution

🔍

Web Search

🔌

MCP Support

Local Model Specs

Quantization

Q4_K

Architecture

Gemma

Runtime

llama.cpp

VRAM Usage

0.23 GB

Disk Size

0.78 GB

Details

Release Date: February 21, 2024
Knowledge Cutoff: September 1, 2024
Source: Local
License: Open Source
Model ID: gemma3-1b-llama-cpp

Last updated: November 15, 2025