Llama 3.1
model405B parameter dense Transformer — the largest open-weight model at release. 128K vocab, 128K context, trained on 15.6T tokens on 16K H100 GPUs. 8B and 70B variants also released.
Llama 3.1 405B was competitive with GPT-4 on many benchmarks, demonstrating that open models had reached frontier quality. AA Intelligence Index: 17. Llama 3 Community License. By the Llama Team.
Model Details
Architecture DENSE
Parameters 405B
Context window 128,000
Variants
| Name | Parameters | Notes |
|---|---|---|
| Llama 3.1 8B | 8B | — |
| Llama 3.1 70B | 70B | — |
| Llama 3.1 405B | 405B | — |
Paper
arXiv: 2407.21783