Granite 4.1
modelIBM's latest foundation model family: 3B, 8B, and 30B dense decoder-only Transformers (GQA, RoPE, SwiGLU, RMSNorm) trained on ~15T tokens with multi-stage pretraining and long-context extension to 512K tokens. Post-trained with SFT on ~4.1M curated samples and RL via on-policy GRPO with DAPO loss. Granite 4.1 8B-Instruct matches or outperforms the previous Granite 4.0 32B MoE despite being a simpler dense model — demonstrating that training quality can substitute for scale.
Also includes Granite Speech 4.1 (ASR + translation), Granite Vision 4.1 (table/chart extraction), Granite Guardian (harm detection), and embedding models. 8B benchmarks: MMLU 73.84, BBH 80.51, GSM8K 92.49, HumanEval 85.37, MBPP 87.30, ArenaHard 68.98. All models under Apache 2.0.
Model Details
Benchmark Scores
| Benchmark | Score | Mode |
|---|---|---|
| MMLU | 73.84 | 5-shot (8B) |
| BBH | 80.51 | 3-shot CoT (8B) |
| GSM8K | 92.49 | 8-shot (8B) |
| HumanEval | 85.37 | pass@1 (8B) |
| MBPP | 87.30 | pass@1 (8B) |
| ArenaHard | 68.98 | 8B |
Variants
| Name | Parameters | Notes |
|---|---|---|
| Granite 4.1 3B | 3B | — |
| Granite 4.1 8B | 8B | — |
| Granite 4.1 30B | 30B | — |
| Granite Vision 4.1 4B | 4B | Vision model for table/chart extraction |
| Granite Speech 4.1 2B | 2B | ASR with translation |