Introduced Group Relative Policy Optimization (GRPO) for mathematical reasoning. The foundational model and paper for DeepSeek's math capabilities.

Outputs 2

DeepSeek-Math

model

Mathematical reasoning model introducing GRPO.

Architecture DENSE

DeepSeekMath: Pushing the Limits of Mathematical Reasoning

paper

Introduced Group Relative Policy Optimization (GRPO) for mathematical reasoning.

arXiv: 2402.03300

reasoningtraining