Nemotron-4 340B | Lab Index

340B dense Transformer with GQA, RoPE, and squared ReLU. Trained on 9T tokens (8T pre-training + 1T continued). MMLU: 81.1, HumanEval: 57.3, Arena Hard: 54.2.

Notable for alignment: >98% of alignment data is synthetically generated. Fits on single DGX H100 (8xGPU) in FP8.

Paper (arXiv)HuggingFace

Model Details

Architecture DENSE

Parameters 340B

Context window 4,096

Training tokens 9T

Paper

arXiv HTML

open-weightfrontier

Model Details

Paper

Related