Nemotron 3 Ultra
announcementThe largest member of the Nemotron 3 family (Nano / Super / Ultra), announced by Jensen Huang at Computex Taipei (June 1, 2026) and scheduled to ship on HuggingFace June 4, 2026. Per Artificial Analysis's launch coverage, Ultra is ~550B total parameters with ~55B active per token (~90% sparsity); NVIDIA's December 2025 family debut press release had earlier stated "about 500 billion parameters and up to 50 billion active per token," so the published number tightened between announcement and launch.
Architecture (per the Nemotron 3 white paper): hybrid Mamba–Transformer Mixture-of-Experts with LatentMoE (a hardware-aware expert design that improves accuracy without sacrificing throughput), MTP layers for faster long-form generation, and NVFP4 training on Blackwell. Post-trained via multi-environment reinforcement learning with granular reasoning-budget control. Context window up to 1M tokens.
Headline benchmark: AA Intelligence Index v4.0 = 48, served at 300+ tokens/second. AA frames it as the leading US open-weights model on its composite at launch.
Status as of today: announced, white paper out, model weights not yet uploaded to HuggingFace (the nvidia/NVIDIA-Nemotron-3-Ultra-* repo slugs return 401 as of June 2); no per-model AA page is live yet either. NVIDIA's announced distribution targets are HuggingFace, ModelScope, OpenRouter, and build.nvidia.com.