Kimi K2
model paperLandmark 1-trillion parameter MoE model family (32B active) released as open-weight. 384 experts with 8 activated per token. Pre-trained on 15.5T tokens on H800 GPUs using the MuonClip optimizer, with context extended from 4K to 128K via YaRN. Focused on agentic intelligence and tool use. Evolved through Thinking and Instruct-0905 variants.
Outputs 4
Kimi K2 Instruct
model Architecture MOE
Parameters 1T
Active params 32B
Kimi K2 Tech Report: Open Agentic Intelligence
paperarXiv: 2507.20534
Kimi-K2-Instruct-0905
modelUpdated K2 with expanded 256k context window and improved coding performance.
Architecture MOE
Parameters 1T
Active params 32B
Context window 256,000
Kimi K2 Thinking
modelReasoning-heavy "thinking agent" capable of hundreds of sequential tool calls.