Qwen3
model paperOutputs 9
Qwen3
modelDense: 0.6B, 1.7B, 4B, 8B, 14B, 32B. MoE: 30B-A3B and flagship 235B-A22B.
Variants
| Name | Parameters | Notes |
|---|---|---|
| Qwen3-0.6B | 0.6B | — |
| Qwen3-1.7B | 1.7B | — |
| Qwen3-4B | 4B | — |
| Qwen3-8B | 8B | — |
| Qwen3-14B | 14B | — |
| Qwen3-32B | 32B | — |
| Qwen3-30B-A3B | 30B | MoE |
| Qwen3-235B-A22B | 235B | MoE flagship |
Qwen3 Technical Report
paperHybrid reasoning architecture. 6 dense models (0.6B, 1.7B, 4B, 8B, 14B, 32B) and 2 MoE models (30B-A3B, 235B-A22B).
arXiv: 2505.09388
Qwen3-Max
modelAlibaba's largest model at over 1 trillion parameters. Closed-source MoE architecture served via Alibaba Cloud Model Studio.
Qwen3-Next
model80B-A3B base model with Instruct and Thinking fine-tuned variants.
Variants
| Name | Parameters | Notes |
|---|---|---|
| Qwen3-Next-80B-A3B-Base | 80B | — |
| Qwen3-Next-80B-A3B-Instruct | 80B | — |
| Qwen3-Next-80B-A3B-Thinking | 80B | — |
Qwen3-Omni
modelUnified multimodal model (text, image, audio, video). Thinker-Talker MoE architecture. SOTA on 32/36 audio benchmarks.
arXiv: 2509.17765
Qwen3-VL Technical Report
paperDense (2B-32B) and MoE (30B-A3B, 235B-A22B) vision-language models with 256K native context.
arXiv: 2511.21631
Qwen3-TTS
modelMultilingual TTS with 3-second voice cloning. Trained on 5M+ hours, 10 languages.
arXiv: 2601.15621
Qwen3-Max-Thinking
modelAPI-only reasoning model.
Parameter count unconfirmed: estimated 1T total, 22B active
Qwen3-ASR & ForcedAligner
modelSpeech recognition for 52 languages. Includes forced alignment model. SOTA among open-source ASR.
arXiv: 2601.21337