First MoE Llama. Scout: 109B total / 17B active (16 experts), 10M token context. Maverick: 400B total / 17B active (128 experts), 1M context. Both natively multimodal (text + image). Trained with early-fusion architecture.

Llama 4 was also the first multimodal Llama with built-in vision capabilities. Behemoth (~2T/288B active) was announced but delayed. AA Intelligence Index: 18 (Maverick), 14 (Scout). Llama 4 Community License.

Model Details

Architecture MOE
Parameters 400B
Active params 17B
Context window 1,000,000

Variants

Name Parameters Notes
Llama 4 Scout 109B 16 experts, 10M context
Llama 4 Maverick 400B 128 experts, 1M context
frontieropen-weightmultimodalmoe

Related