MiniMax-M1
model paperWorld's first large-scale hybrid-attention reasoning model (456B total, 45.9B active). Uses Lightning Attention for 1M-token contexts with 75% less compute than rivals. CISPO algorithm for RL training used only 512 H800 GPUs for three weeks (~$530K).
Outputs 2
MiniMax-M1
model Architecture MOE
Parameters 456B
Active params 45.9B
Context window 1,000,000
MiniMax-M1: Scaling Test-Time Compute with Lightning Attention
paperarXiv: 2506.13585