Falcon 3
modelFamily of 1B, 3B, 7B, 10B dense Transformers plus Mamba-7B SSM variant. 7B trained from scratch on 14T tokens (1,024 H100s). 10B created via depth upscaling + 2T additional tokens. 1B/3B via pruning + distillation. MMLU: 73.1 (10B). #1 on HuggingFace Open LLM Leaderboard at launch for size class.
Model Details
Architecture DENSE
Parameters 10B
Variants
| Name | Parameters | Notes |
|---|---|---|
| Falcon3-1B | 1B | — |
| Falcon3-3B | 3B | — |
| Falcon3-7B | 7B | — |
| Falcon3-10B | 10B | — |
| Falcon3-Mamba-7B | 7B | Pure Mamba SSM |