Data scaling laws for mathematical reasoning. Shows that scaling synthetic math data (Skywork-MathQA, 2.5M instances) enables a 7B model to surpass early GPT-4 on MATH (51.2%). Systematic study of how math performance scales with SFT data quantity and quality.

Paper

arXiv: 2407.08348

reasoningscalingresearch