Kimi Linear | Lab Index

Hybrid linear attention architecture (KDA + MLA). 3B active / 48B total MoE model with 75% KV-cache reduction and 6x throughput at 1M context.

Outputs 2

model

Architecture MOE

Parameters 48B

Active params 3B

paper

moeefficiencyattentionarchitecture