NVIDIA Labs tri-mode LLM family: at each step the model can run as autoregressive, diffusion, or self-speculation, with a learned controller picking the regime. Reports 2.6–6.4× throughput over AR baselines while matching or beating Qwen3 accuracy on standard benchmarks.

Three sizes (3B / 8B / 14B), each with Base, Instruct, and VL variants; eight model uploads total on May 23. Released under the Nemotron Open Model License.

Model Details

Parameters 14B

Variants

Name Parameters Notes
Nemotron-Labs-Diffusion-3B (Base/Instruct/VL) 3B
Nemotron-Labs-Diffusion-8B (Base/Instruct/VL) 8B
Nemotron-Labs-Diffusion-14B (Base/Instruct) 14B
foundationaldiffusionopen-weightefficiency

Related