Falcon Mamba | Lab Index

First competitive pure-Mamba (attention-free) 7B LLM. 64 layers, 4096 hidden dim, trained on 5.8T tokens using 256 H100s. Progressive context (2K→8K training), constant-memory inference at 130K+ tokens. Demonstrates SSMs can match Transformer quality at the 7B scale.

Paper (arXiv)HuggingFace

Model Details

Architecture DENSE

Parameters 7.27B

Context window 8,192

Training tokens 5.8T

Paper

arXiv HTML

open-weightarchitecture

Model Details

Paper

Related