Skywork-Reward | Lab Index

Reward model series that topped RewardBench with only 80K curated preference pairs. V2 (Jul 2025, ICLR 2026) scales to 8 models (0.6B-8B) trained on SynPref-40M (26M curated pairs), achieving SOTA across 7 reward model benchmarks. CC BY 4.0.

Paper V1 (arXiv)Paper V2 (arXiv)HuggingFace (Gemma-27B)

Outputs 2

Skywork-Reward V1

model

#1 on RewardBench with 80K preference pairs. Gemma-27B and Llama-8B variants.

Paper (arXiv)

arXiv HTML

Skywork-Reward V2

model 2025-07-02

8 models (0.6B-8B), SynPref-40M dataset, SOTA on 7 benchmarks. ICLR 2026.

Paper (arXiv)

Venue ICLR 2026

arXiv HTML

alignmentopen-weightresearch