Trillion-parameter sparse language model (1.085T) extending PanGu-alpha with Random Routed Experts (RRE). Trained on 329B tokens in 40+ languages on 512 Ascend 910 accelerators.

Outputs 2

PanGu-Sigma

model
Architecture MOE
Parameters 1.1T

PanGu-Sigma: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing

paper

arXiv: 2303.10845

nlpmoe

Related