OpenELM
modelOpen Efficient Language Models. 270M to 3B parameter dense Transformers with layer-wise scaling (varying width per layer for parameter efficiency). Fully open: training code, data, weights, and evaluation.
Apple's first open-weight language models. Trained on publicly available data. ICML 2024 workshop. Apache 2.0.
Model Details
Architecture DENSE
Parameters 3B
Variants
| Name | Parameters | Notes |
|---|---|---|
| OpenELM 270M | 270M | — |
| OpenELM 450M | 450M | — |
| OpenELM 1.1B | 1.1B | — |
| OpenELM 3B | 3B | — |
Paper
arXiv: 2404.14619