LLaMA
model"LLaMA: Open and Efficient Foundation Language Models." Dense Transformers from 7B to 65B parameters trained on publicly available data only. LLaMA-65B competitive with Chinchilla-70B and PaLM-540B; LLaMA-13B outperformed GPT-3 (175B) on most benchmarks.
LLaMA proved that smaller models trained on more data (following Chinchilla scaling laws) could match much larger models, catalyzing an explosion of open-source fine-tuning (Alpaca, Vicuna, etc.) and establishing Meta as the leader of the open-weight movement. By Touvron et al.
Model Details
Architecture DENSE
Parameters 65B
Variants
| Name | Parameters | Notes |
|---|---|---|
| LLaMA 7B | 7B | — |
| LLaMA 13B | 13B | — |
| LLaMA 33B | 33B | — |
| LLaMA 65B | 65B | — |
Paper
arXiv: 2302.13971