OLMo
modelFirst competitive LLM to release everything: weights, training data (Dolma), training code, logs, and 500+ intermediate checkpoints. 1B and 7B dense Transformers (32 layers, 4096 hidden, SwiGLU, RoPE). Trained on 2-2.46T tokens on Dolma using 256 AMD MI250X GPUs (LUMI) + 27 NVIDIA A100 nodes. Apache 2.0. ACL 2024.
Established the paradigm of fully reproducible open-source LLM research that defined all subsequent OLMo releases.
Model Details
Architecture DENSE
Parameters 7B
Context window 2,048
Paper
arXiv: 2402.00838
Venue: ACL 2024