Fine-tune of Meta Llama 3.1 70B via RLHF (REINFORCE) with the HelpSteer2 preference dataset. At release, ranked #1 on Arena Hard (85.0), AlpacaEval 2 LC (57.6), MT-Bench (8.98), and Chatbot Arena Elo (1267).

Accompanied by Llama-3.1-Nemotron-70B-Reward, which ranked #1 on RewardBench at release. 128K context.

Model Details

Architecture DENSE
Parameters 70B
Context window 128,000

Paper

arXiv: 2410.01257

open-weightreasoning

Related