Falcon 2
model11B dense Transformer trained on 5.5T tokens in 4 stages with progressive context extension (2K→8K). 11 languages. VLM variant adds CLIP ViT-L/14 vision encoder. MMLU: 58.4, HellaSwag: 82.9.
Model Details
Architecture DENSE
Parameters 11B
Context window 8,192
Paper
arXiv: 2407.14885