Nemotron-4 340B
model340B dense Transformer with GQA, RoPE, and squared ReLU. Trained on 9T tokens (8T pre-training + 1T continued). MMLU: 81.1, HumanEval: 57.3, Arena Hard: 54.2.
Notable for alignment: >98% of alignment data is synthetically generated. Fits on single DGX H100 (8xGPU) in FP8.
Model Details
Architecture DENSE
Parameters 340B
Context window 4,096
Paper
arXiv: 2406.11704