Introduces CoTP, a framework that selects high-value Chain-of-Thought (CoT) data by abstracting atomic reasoning patterns. Using only 10B tokens of CoTP data, the researchers improved an 85B MoE model by 9.58% on AIME benchmarks, demonstrating highly efficient reasoning enhancement during mid-training.

Paper

arXiv: 2509.21124

reasoningscalingresearch

Related