video-dubbing-3min

Build error

File size: 1,542 Bytes

45ee559

# Overflow TTS

Neural HMMs are a type of neural transducer recently proposed for
sequence-to-sequence modelling in text-to-speech. They combine the best features
of classic statistical speech synthesis and modern neural TTS, requiring less
data and fewer training updates, and are less prone to gibberish output caused
by neural attention failures. In this paper, we combine neural HMM TTS with
normalising flows for describing the highly non-Gaussian distribution of speech
acoustics. The result is a powerful, fully probabilistic model of durations and
acoustics that can be trained using exact maximum likelihood. Compared to
dominant flow-based acoustic models, our approach integrates autoregression for
improved modelling of long-range dependences such as utterance-level prosody.
Experiments show that a system based on our proposal gives more accurate
pronunciations and better subjective speech quality than comparable methods,
whilst retaining the original advantages of neural HMMs. Audio examples and code
are available at https://shivammehta25.github.io/OverFlow/.


## Important resources & papers
- HMM: https://de.wikipedia.org/wiki/Hidden_Markov_Model
- OverflowTTS paper: https://arxiv.org/abs/2211.06892
- Neural HMM: https://arxiv.org/abs/2108.13320
- Audio Samples: https://shivammehta25.github.io/OverFlow/


## OverflowConfig
```{eval-rst}
.. autoclass:: TTS.tts.configs.overflow_config.OverflowConfig
    :members:
```

## Overflow Model
```{eval-rst}
.. autoclass:: TTS.tts.models.overflow.Overflow
    :members:
```