DeepSeek-V3
model paperFrontier 671B MoE model with Multi-Token Prediction and FP8 mixed-precision training. V3-0324 update released 2025-03-24. Accompanied by a technical report and a paper on scaling challenges.
Outputs 3
DeepSeek-V3
model Architecture MOE
Parameters 671B
Active params 37B
Variants
| Name | Parameters | Notes |
|---|---|---|
| DeepSeek-V3 | — | — |
| DeepSeek-V3-0324 | — | Released 2025-03-24 |
DeepSeek-V3 Technical Report
paperTechnical report for the landmark 671B MoE model with Multi-Token Prediction and FP8 mixed-precision training.
arXiv: 2412.19437
Insights into DeepSeek-V3: Scaling Challenges
paperPaper detailing the scaling challenges encountered during DeepSeek-V3 development, including hardware architecture insights.
arXiv: 2505.09343