Mistral Large 3
modelMistral's largest model. 675B total / 41B active MoE with granular mixture-of-experts architecture, plus 2.5B vision encoder for native multimodal. 256K context. Trained on 3,000 H200 GPUs.
MMLU-Pro: 73.11%, MATH-500: 93.60%. Competitive with frontier models across reasoning, coding, and multilingual tasks. Apache 2.0.
Model Details
Architecture MOE
Parameters 675B
Active params 41B
Context window 256,000