DeepSeek-Math

Introduced Group Relative Policy Optimization (GRPO) for mathematical reasoning. The foundational model and paper for DeepSeek's math capabilities.

Outputs 2

model

Mathematical reasoning model introducing GRPO.

Architecture DENSE

paper

Introduced Group Relative Policy Optimization (GRPO) for mathematical reasoning.

Citations 69

reasoningtraining