"LLaMA: Open and Efficient Foundation Language Models." Dense Transformers from 7B to 65B parameters trained on publicly available data only. LLaMA-65B competitive with Chinchilla-70B and PaLM-540B; LLaMA-13B outperformed GPT-3 (175B) on most benchmarks.

LLaMA proved that smaller models trained on more data (following Chinchilla scaling laws) could match much larger models, catalyzing an explosion of open-source fine-tuning (Alpaca, Vicuna, etc.) and establishing Meta as the leader of the open-weight movement. By Touvron et al.

Model Details

Architecture DENSE
Parameters 65B

Variants

Name Parameters Notes
LLaMA 7B 7B
LLaMA 13B 13B
LLaMA 33B 33B
LLaMA 65B 65B

Paper

arXiv: 2302.13971

open-weightfoundational

Related