Nemotron 3 Ultra

The largest model in the Nemotron 3 family (Nano / Super / Ultra), shipped on HuggingFace June 4, 2026 after a Computex Taipei pre-announcement by Jensen Huang on June 1. 550B total parameters with 55B active per token (~90% sparsity), trained on 20T text tokens in NVFP4 on Blackwell.

Architecture: hybrid Mamba-2 / Attention Mixture-of-Experts with LatentMoE (hardware-aware expert design with a 2,048-dim latent compression), 108 total layers, 512 experts per layer activated top-22, 64 query / 2 KV heads (GQA), and Multi-Token Prediction (MTP) with 2 shared-weight heads for native speculative decoding. Context window 1M tokens after a long-context extension phase. Post-trained with SFT + multi-environment RLVR + Multi-teacher On-Policy Distillation (MOPD), with explicit reasoning-budget control.

Throughput: 5.9× / 4.8× / 1.6× higher inference throughput than GLM-5.1-754B-A40B, Kimi-K2.6-1T-A32B, and Qwen-3.5-397B-17B respectively on the 8K-input / 64K-output setting, at on-par accuracy across agentic and reasoning benchmarks. AA frames Ultra as the leading US open-weights model on its composite at launch.

Headline benchmarks (BF16, post-trained): MMLU-Pro 86.8, GPQA (no tools) 87.0, LiveCodeBench v6 89.0, SWE-Bench Verified 71.9, Terminal-Bench 2.1 56.4, RULER @ 1M 94.7, AA Intelligence Index v4.1 = 38 served at 300+ tokens/s.

Released as four checkpoints under the Linux Foundation OpenMDW-1.1 license: Base-BF16 (pretrained-only), BF16 (post-trained), NVFP4 (quantized for faster inference), and GenRM (the generative reward model used during RL). Distribution targets: HuggingFace, ModelScope, OpenRouter, and build.nvidia.com.

Companion datasets shipped on HuggingFace 2026-06-04/05: Nemotron-Pretraining-Code-v3 (173B tokens of fresh code with Sept-2025 cutoff), Nemotron-Pretraining-Legal-v1, Nemotron-Pretraining-Specialized-v1.2 (factual recall + moral scenarios), Nemotron-Posttraining-v3, Nemotron-SFT-SWE-v3, Nemotron-RL-Ultra-Training-Blends, Nemotron-RL-Science-v1, Nemotron-RL-Multichallenge-v1, Nemotron-RL-CFBench-v1, Nemotron-RL-SysBench-v1, Nemotron-RL-InverseIFEval-v1, Nemotron-RL-Instruction-Following-Structured-Outputs-v2, plus Nemotron-Personas-Vietnam and Nemotron-Personas-El-Salvador.

Nemotron 3 Ultra blog Artificial Analysis Nemotron 3 Ultra Technical Report (PDF)HuggingFace collection Nemotron 3 family white paper (arXiv)Artificial Analysis launch coverage NVIDIA newsroom (Dec 2025 family debut)GitHub (NeMo / Nemotron)

Model Details

Architecture MOE

Parameters 550B

Active params 55B

Context window 1,000,000

AA Intelligence 38

License OpenMDW-1.1

Benchmark Scores

Benchmark	Score	Mode
MMLU-Pro	86.8	—
GPQA (no tools)	87.0	—
LiveCodeBench v6	89.0	—
SWE-Bench Verified	71.9	—
Terminal-Bench 2.1	56.4	—
RULER @ 1M	94.7	—

Variants

Name	Parameters	Notes
Nemotron 3 Ultra 550B-A55B BF16	550B	Post-trained flagship; BF16 weights
Nemotron 3 Ultra 550B-A55B NVFP4	550B	NVFP4-quantized for higher inference throughput on Blackwell
Nemotron 3 Ultra 550B-A55B Base BF16	550B	Pretrained-only base checkpoint
Nemotron 3 Ultra 550B-A55B GenRM	550B	Generative reward model used during RL post-training

Paper

arXiv HTML

frontieropen-weightmoereasoningagentichybrid-architecture

Your notes

Model Details

Benchmark Scores

Variants

Paper

Related