146B total / 22B active MoE (16 experts, 2 active per token), upcycled from Skywork-13B dense checkpoints. Released as the open-source "medium" variant of TianGong 3.0; a 400B "large" variant was trained but remains proprietary. Deep dive into MoE training techniques including dense-to-MoE initialization strategies. Inference cost reduced ~3x vs comparable dense model.

Model Details

Architecture MOE
Parameters 146B
Active params 22B

Paper

arXiv: 2406.06563

moeopen-weightresearch

Related