Grad-TTS

Diffusion probabilistic model for text-to-speech synthesis. Uses a score-based decoder to produce mel-spectrograms by gradually transforming noise predicted by the encoder, aligned via Monotonic Alignment Search. Enables flexible trade-off between sound quality and inference speed. Competitive with state-of-the-art TTS in Mean Opinion Score. One of the first applications of diffusion models to speech synthesis.

Paper (arXiv)Published: ICML 2021 GitHub

Outputs 2

model

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech

paper

Citations 43

arXiv HTML

audiogenerationopen-source

Your notes

Outputs 2

Grad-TTS

Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech