ZAYA1-8B
modelZyphra's reasoning successor to Zamba2: an 8B-total / 700M-active MoE trained end-to-end on a full-stack AMD platform — pretrain, midtrain, and SFT all on AMD Instinct MI300 GPUs with AMD networking and software. To Zyphra's knowledge this is the largest publicly released foundation model trained entirely on AMD silicon, and the companion paper documents the systems-level co-design.
Reported benchmarks (with their Markovian RSA test-time compute method): 91.9% AIME'25, 89.6% HMMT'25. Zyphra claims it matches or exceeds DeepSeek-R1-0528 on math and coding despite having under 1B active parameters. Not currently scored on Artificial Analysis — numbers above are self-reported from the technical report.
ZAYA1-8B Technical Report (arXiv)Training Foundation Models on a Full-Stack AMD Platform (arXiv)VentureBeat coverageIBM × AMD × Zyphra partnership
Model Details
Architecture MOE
Parameters 8B
Active params 700M
Training hardware AMD Instinct MI300
Benchmark Scores
| Benchmark | Score | Mode |
|---|---|---|
| AIME 2025 | 91.9% | with Markovian RSA |
| HMMT 2025 | 89.6% | with Markovian RSA |