OLMo 2 | Lab Index

"2 OLMo 2 Furious." Dense Transformers at 7B (4T tokens), 13B (5T), and 32B (6T tokens, 1.5 epochs). 4K context. Introduces Dolmino Mix late-stage curriculum training (specialized data during annealing) and model souping (merging 3 annealing runs). Training FLOPs: 1.3x10^24 for 32B.

First fully open model to outperform GPT-3.5 Turbo and GPT-4o mini (post-trained with SFT + DPO + PPO + RLVR). MMLU: 78.7 (32B base). COLM 2025. Apache 2.0.

Paper (arXiv)HuggingFace OpenRouter

Model Details

Architecture DENSE

Parameters 32B

Context window 4,096

Training tokens 6T

Variants

Name	Parameters	Notes
OLMo 2 7B	7B	—
OLMo 2 13B	13B	—
OLMo 2 32B	32B	—

Paper

Venue COLM 2025

arXiv HTML

open-sourceopen-weightfrontier

Model Details

Variants

Paper

Related