DeepSeek-V2

Massive 236B MoE model (21B active) that introduced Multi-head Latent Attention (MLA). Accompanied by a technical report.

Outputs 2

model

Architecture MOE

Parameters 236B

Active params 21B

AA Intelligence 9

paper

Technical report detailing Multi-head Latent Attention and DeepSeekMoE architecture innovations.

Citations 97

moefrontieropen-weight