Sparse MoE model (200B total, 20B active, 256 experts with 8 active per token) with a 256k context window. Part of ByteDance's flagship Doubao series.

Model Details

Architecture MOE
Parameters 200B
Active params 20B
Context window 256,000
moescaling