Zonos2 (ZONOS2) | Lab Index

Zyphra's open-weight multilingual text-to-speech model and successor to Zonos — an MoE backbone (16 experts, top-1 routing; 28 layers, GQA) generating DAC audio tokens from UTF-8 byte input, with ECAPA-TDNN speaker embeddings for zero-shot voice cloning. Trained on 6M+ hours of multilingual speech; Apache-2.0, with a Mini-SGLang-based inference server for low-latency synthesis.

Three-tier language coverage: Tier 1 English / Mandarin / Japanese, Tier 2 adds Korean, Russian, Italian, Portuguese, French, Spanish, Vietnamese, German, Hebrew, Dutch, and Tier 3 a further ~23 languages. Extends Zyphra's foundation-model line beyond language and EEG into speech generation.

Zyphra (project page)HuggingFace

Model Details

Architecture MOE

Experts 16 (top-1)

License Apache 2.0

speechaudioopen-weightmoemultilingual

Your notes

Model Details