Third generation of the Qwen model family. Introduced a massive 36-trillion token pretraining dataset. Includes dense, MoE, multimodal, speech, and reasoning models.

Outputs 9

Qwen3

model

Dense: 0.6B, 1.7B, 4B, 8B, 14B, 32B. MoE: 30B-A3B and flagship 235B-A22B.

Variants

Name Parameters Notes
Qwen3-0.6B 0.6B
Qwen3-1.7B 1.7B
Qwen3-4B 4B
Qwen3-8B 8B
Qwen3-14B 14B
Qwen3-32B 32B
Qwen3-30B-A3B 30B MoE
Qwen3-235B-A22B 235B MoE flagship

Qwen3 Technical Report

paper

Hybrid reasoning architecture. 6 dense models (0.6B, 1.7B, 4B, 8B, 14B, 32B) and 2 MoE models (30B-A3B, 235B-A22B).

arXiv: 2505.09388

Qwen3-Max

model

Alibaba's largest model at over 1 trillion parameters. Closed-source MoE architecture served via Alibaba Cloud Model Studio.

Architecture MOE
Parameters 1T+

Qwen3-Next

model

80B-A3B base model with Instruct and Thinking fine-tuned variants.

Architecture MOE
Parameters 80B
Active params 3B

Variants

Name Parameters Notes
Qwen3-Next-80B-A3B-Base 80B
Qwen3-Next-80B-A3B-Instruct 80B
Qwen3-Next-80B-A3B-Thinking 80B

Qwen3-Omni

model

Unified multimodal model (text, image, audio, video). Thinker-Talker MoE architecture. SOTA on 32/36 audio benchmarks.

Architecture MOE

arXiv: 2509.17765

Qwen3-VL Technical Report

paper

Dense (2B-32B) and MoE (30B-A3B, 235B-A22B) vision-language models with 256K native context.

arXiv: 2511.21631

Qwen3-TTS

model

Multilingual TTS with 3-second voice cloning. Trained on 5M+ hours, 10 languages.

arXiv: 2601.15621

Qwen3-Max-Thinking

model

API-only reasoning model.

Architecture MOE

Parameter count unconfirmed: estimated 1T total, 22B active

Qwen3-ASR & ForcedAligner

model

Speech recognition for 52 languages. Includes forced alignment model. SOTA among open-source ASR.

arXiv: 2601.21337

open-weightmoenlpmultimodalreasoning