DeepSeek-Math
model paperIntroduced Group Relative Policy Optimization (GRPO) for mathematical reasoning. The foundational model and paper for DeepSeek's math capabilities.
Outputs 2
DeepSeekMath: Pushing the Limits of Mathematical Reasoning
paperIntroduced Group Relative Policy Optimization (GRPO) for mathematical reasoning.
arXiv: 2402.03300