Nemotron 3 Super
modelNVIDIA's current flagship. 120B total / 12B active per token using LatentMoE + Mamba-2 + Attention hybrid architecture with Multi-Token Prediction. Trained in NVFP4 precision. 1M context window. 5x throughput over previous generation.
AA Intelligence Index: 36 (#2 in class, 293 t/s — fastest among top models). AIME25: 90.21, HMMT Feb25: 93.67, GPQA: 79.23/82.70 (tools), MMLU-Pro: 83.73, SWE-Bench: 60.47 (OpenHands), RULER@1M: 91.75. Open weights with complete training data (10T+ tokens) and recipes released.
Model Details
Architecture MOE
Parameters 120B
Active params 12B
Context window 1,000,000
Paper
arXiv: 2512.20856