Falcon (7B / 40B / 180B)
modelOriginal Falcon series. Dense causal Transformers with multi-query attention, RoPE, FlashAttention. Trained primarily on RefinedWeb (5T token open web corpus). Falcon-180B (3,500B tokens, 4,096 A100s) was the largest open-weight model at launch.
Falcon-7B: 1,500B tokens, 384 A100s. Falcon-40B: 1,000B tokens. All Apache 2.0 (180B under TII License).
Model Details
Architecture DENSE
Parameters 180B
Context window 2,048
Variants
| Name | Parameters | Notes |
|---|---|---|
| Falcon-7B | 7B | — |
| Falcon-40B | 40B | — |
| Falcon-180B | 180B | — |
Paper
arXiv: 2311.16867