Llama 4 Scout

Meta

Efficient MoE multimodal model with industry-leading 10M token context window. 109B total / 17B active parameters across 16 experts. Native text + image understanding. Trained on 40T tokens in 12 languages.

Text Generation Local Latest Llama Family v4
Parameters
109B
params
Context Window
10.0M
tokens
Max Output
-
tokens
Input Price
-
per 1M tokens
Output Price
-
per 1M tokens

Capabilities

👁️
Vision
Function Calling
📋
JSON Mode
🌊
Streaming
💬
System Prompt
🖥️
Code Execution
🔍
Web Search
🔌
MCP Support

Local Model Specs

Quantization
Q4_K_M
Architecture
MoE (16 experts, 17B active)
Runtime
Ollama / llama.cpp
Disk Size
67 GB

Details

Release Date
April 5, 2025
Knowledge Cutoff
-
Source
Local
License
Llama 4 Community
Model ID
llama4-scout-local
Last updated: March 13, 2026