173 models across text, image, audio & embedding
Flagship Qwen model with 397B total parameters (17B active via MoE). Hybrid architecture with Gated Delta Networks and 512 experts. Supports 201 languages and thinking/non-thinking modes.