MiniMax-M3

MiniMax's natively multimodal frontier MoE (text + image + video → text): ~428B total / ~23B active, 128 experts (top-4 + 1 shared, 60 layers with the first 3 dense), 1M-token context, dual thinking / non-thinking modes. Its headline contribution is MiniMax Sparse Attention (MSA) — a learned blockwise sparse attention over GQA that cuts per-token attention compute ~28× at 1M context, giving M3 a reported ~9× prefill / ~15× decode speedup over M2 at 1M context.

MiniMax Community License (commercial use needs attribution; written authorization above $20M/yr revenue). Self-reported agentic/coding benchmarks: SWE-Bench Pro 59.0, Terminal-Bench 2.1 66.0, SWE-fficiency 34.8, KernelBench Hard 28.8, MCP-Atlas 74.2. Artificial Analysis lists it on the Agentic Index at 59.1 and now scores it 44 on the Intelligence Index (v4.1).

Announcement (MiniMax blog)HuggingFace Technical Report (arXiv)Artificial Analysis

Model Details

Architecture MOE

Parameters 428B

Active params 23B

Experts 128 (top-4)

Context window 1,048,576

AA Intelligence 44

License MiniMax Community License

Benchmark Scores

Benchmark	Score	Mode
SWE-Bench Pro	59.0	—
Terminal-Bench 2.1	66.0	—
SWE-fficiency	34.8	—
KernelBench Hard	28.8	—
MCP-Atlas	74.2	—

frontieropen-weightmoemultimodalagenticcodingreasoning

Your notes

Model Details

Benchmark Scores

Related