"2 OLMo 2 Furious." Dense Transformers at 7B (4T tokens), 13B (5T), and 32B (6T tokens, 1.5 epochs). 4K context. Introduces Dolmino Mix late-stage curriculum training (specialized data during annealing) and model souping (merging 3 annealing runs). Training FLOPs: 1.3x10^24 for 32B.

First fully open model to outperform GPT-3.5 Turbo and GPT-4o mini (post-trained with SFT + DPO + PPO + RLVR). MMLU: 78.7 (32B base). COLM 2025. Apache 2.0.

Model Details

Architecture DENSE
Parameters 32B
Context window 4,096

Variants

Name Parameters Notes
OLMo 2 7B 7B
OLMo 2 13B 13B
OLMo 2 32B 32B

Paper

arXiv: 2501.00656

Venue: COLM 2025

open-sourceopen-weightfrontier

Related