MiniMax-M3
modelMiniMax's natively multimodal frontier MoE (text + image + video → text): ~428B total / ~23B active, 128 experts (top-4 + 1 shared, 60 layers with the first 3 dense), 1M-token context, dual thinking / non-thinking modes. Its headline contribution is MiniMax Sparse Attention (MSA) — a learned blockwise sparse attention over GQA that cuts per-token attention compute ~28× at 1M context, giving M3 a reported ~9× prefill / ~15× decode speedup over M2 at 1M context.
MiniMax Community License (commercial use needs attribution; written authorization above $20M/yr revenue). Self-reported agentic/coding benchmarks: SWE-Bench Pro 59.0, Terminal-Bench 2.1 66.0, SWE-fficiency 34.8, KernelBench Hard 28.8, MCP-Atlas 74.2. Artificial Analysis lists it on the Agentic Index at 59.1 and now scores it 44 on the Intelligence Index (v4.1).
Model Details
Benchmark Scores
| Benchmark | Score | Mode |
|---|---|---|
| SWE-Bench Pro | 59.0 | — |
| Terminal-Bench 2.1 | 66.0 | — |
| SWE-fficiency | 34.8 | — |
| KernelBench Hard | 28.8 | — |
| MCP-Atlas | 74.2 | — |