GEPO
paperIntroduces Group Expectation Policy Optimization (GEPO), an asynchronous RL algorithm robust to latency in geographically distributed computing networks. GEPO uses group expectation weighting to reduce variance in importance weights, enabling stable training of large models across heterogeneous nodes.
Paper
arXiv: 2508.17850