Reward model series that topped RewardBench with only 80K curated preference pairs. V2 (Jul 2025, ICLR 2026) scales to 8 models (0.6B-8B) trained on SynPref-40M (26M curated pairs), achieving SOTA across 7 reward model benchmarks. CC BY 4.0.

Outputs 2

Skywork-Reward V1

model

#1 on RewardBench with 80K preference pairs. Gemma-27B and Llama-8B variants.

arXiv: 2410.18451

Skywork-Reward V2

model

8 models (0.6B-8B), SynPref-40M dataset, SOTA on 7 benchmarks. ICLR 2026.

arXiv: 2507.01352

Venue: ICLR 2026

alignmentopen-weightresearch