Zamba2 (Hybrid SSM/Transformer Suite)
modelZyphra's flagship architectural family. Zamba (May 2024) introduced a compact 7B hybrid combining a Mamba SSM backbone with a single shared attention module, capturing most of the benefit of attention at minimal parameter cost. Zamba2 (November 2024) extends this to a 1.2B / 2.7B / 7.4B suite trained on up to 3T tokens, with several improvements: Mamba1 → Mamba2 backbone, two alternating shared attention blocks (instead of one) with LoRA projectors for depth specialization, and rotary position embeddings.
Because attention is shared, Zamba2 stores KV caches only at the (~1:6) attention positions — ~6x smaller KV footprint than a pure Transformer of similar quality, with correspondingly lower inference latency and memory cost. Open weights (Apache 2.0). Note: not currently scored on Artificial Analysis; benchmark claims are self-reported.
Model Details
Variants
| Name | Parameters | Notes |
|---|---|---|
| Zamba-7B | 7B | Original 2024 release; single shared attention block; Mamba1 backbone |
| Zamba2-1.2B | 1.2B | — |
| Zamba2-2.7B | 2.7B | — |
| Zamba2-7B | 7.4B | Mamba2 backbone, two alternating shared attention blocks with LoRA, RoPE |