Labs Timeline What's New Collections

↑↓ to navigate ↵ to open Esc to close

Labs Timeline What's New

MixGRPO

paper

2025-07-28 Tencent

Your tags

Your notes

Research on unlocking flow-based GRPO (Group Relative Policy Optimization) efficiency for reasoning models.

Paper (arXiv)GitHub

reasoningtrainingresearch