Grad-TTS
model paperDiffusion probabilistic model for text-to-speech synthesis. Uses a score-based decoder to produce mel-spectrograms by gradually transforming noise predicted by the encoder, aligned via Monotonic Alignment Search. Enables flexible trade-off between sound quality and inference speed. Competitive with state-of-the-art TTS in Mean Opinion Score. One of the first applications of diffusion models to speech synthesis.
Outputs 2
Grad-TTS
modelGrad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
paperarXiv: 2105.06337