Family of reasoning models derived from Meta Llama via Neural Architecture Search (NAS): Ultra (253B, from Llama 3.1 405B with skip attention, variable FFN, FFN fusion), Super (49B, from Llama 3.3 70B), and Nano (8B). First open models with dynamic reasoning toggle (on/off at inference).

Ultra reasoning ON: MATH-500 97.0, GPQA 76.0, AIME25 72.5, LiveCodeBench 66.3. Outperforms DeepSeek-R1 on GPQA at less than half the parameters. Super fits on single H100-80GB. v1.5 adds RPO, RLVR, and iterative DPO for enhanced agentic capabilities.

Model Details

Architecture DENSE
Parameters 253B
Context window 128,000

Variants

Name Parameters Notes
Llama-3.1-Nemotron-Ultra-253B 253B
Llama-3.3-Nemotron-Super-49B 49B
Llama-3.1-Nemotron-Nano-8B 8B

Paper

arXiv: 2505.00949

open-weightreasoningfrontier

Related