DeepSeek-MoE
model paperPioneering 16B Mixture-of-Experts model with only 2.8B active parameters, setting the stage for DeepSeek's future efficiency focus. Accompanied by a foundational paper on expert specialization.
Outputs 2
DeepSeekMoE: Towards Ultimate Expert Specialization
paperFoundational paper on expert specialization in MoE language models.
arXiv: 2401.06066