Hybrid linear attention architecture (KDA + MLA). 3B active / 48B total MoE model with 75% KV-cache reduction and 6x throughput at 1M context.

Outputs 2

Kimi Linear Model

model
Architecture MOE
Parameters 48B
Active params 3B

Kimi Linear: Hybrid Linear Attention Architecture

paper

arXiv: 2510.26692

moeefficiencyattentionarchitecture