72B MoE model with a novel Mixture of Grouped Experts (MoGE) architecture, activating 16B parameters. Open-sourced as part of the openPangu initiative.

Paper

arXiv: 2505.21411

moeopen-weight

Related