RL for subgoal decomposition in formal mathematical reasoning. Includes the DeepSeek-ProverBench evaluation suite.

Model Details

Architecture MOE
Parameters 671B

Paper

arXiv: 2504.21801

reasoningopen-weight

Related