Hybrid Mamba-Transformer reasoning models (12B and 9B compressed via Minitron). Trained on 20T tokens in FP8. On par with Qwen3-8B on reasoning with up to 6x higher inference throughput. 128K context, fits on single A10G 22GB in BF16.

Nemotron Nano V2 VL adds vision encoder for document understanding, long video comprehension, and reasoning. 35% higher throughput than predecessor on multi-page document tasks.

Model Details

Architecture DENSE
Parameters 12B
Context window 128,000

Variants

Name Parameters Notes
Nemotron Nano 12B V2 12B
Nemotron Nano 9B V2 9B Compressed via Minitron
Nemotron Nano 12B V2 VL 12B Vision-language variant

Paper

arXiv: 2508.14444

open-weightefficiencymultimodalreasoning

Related