AI Lab Tracker
Labs
Timeline
MixGRPO
paper
2025-07-28
Tencent
Research on unlocking flow-based GRPO (Group Relative Policy Optimization) efficiency for reasoning models.
Paper (arXiv)
GitHub
reasoning
training
research