MiniMax's natively multimodal frontier MoE (text + image + video → text): ~428B total / ~23B active, 128 experts (top-4 + 1 shared, 60 layers with the first 3 dense), 1M-token context, dual thinking / non-thinking modes. Its headline contribution is MiniMax Sparse Attention (MSA) — a learned blockwise sparse attention over GQA that cuts per-token attention compute ~28× at 1M context, giving M3 a reported ~9× prefill / ~15× decode speedup over M2 at 1M context.

MiniMax Community License (commercial use needs attribution; written authorization above $20M/yr revenue). Self-reported agentic/coding benchmarks: SWE-Bench Pro 59.0, Terminal-Bench 2.1 66.0, SWE-fficiency 34.8, KernelBench Hard 28.8, MCP-Atlas 74.2. Artificial Analysis lists it on the Agentic Index at 59.1 and now scores it 44 on the Intelligence Index (v4.1).

Model Details

Architecture MOE
Parameters 428B
Active params 23B
Experts 128 (top-4)
Context window 1,048,576
AA Intelligence 44
License MiniMax Community License

Benchmark Scores

Benchmark Score Mode
SWE-Bench Pro 59.0
Terminal-Bench 2.1 66.0
SWE-fficiency 34.8
KernelBench Hard 28.8
MCP-Atlas 74.2
frontieropen-weightmoemultimodalagenticcodingreasoning

Related