A large-scale cost-effective pre-trained language model series (up to 198B parameters) that introduced Mixture-of-Experts (MoE) and multilingual capabilities. Supported by BAAI, it achieved state-of-the-art results on both Chinese and English tasks while maintaining computational efficiency.

Model Details

Architecture MOE
Parameters 198B

Paper

arXiv: 2106.10715

moenlpresearch

Related