ProDiff and FastDiff Model Card

Key Features

  • Extremely-Fast diffusion text-to-speech synthesis pipeline for potential industrial deployment.
  • Tutorial and code base for speech diffusion models.
  • More supported diffusion mechanism (e.g., guided diffusion) will be available.

Model Details

  • Model type: Diffusion-based text-to-speech generation model

  • Language(s): English

  • Model Description: A conditional diffusion probabilistic model capable of generating high fidelity speech efficiently.

  • Resources for more information: FastDiff GitHub Repository, FastDiff Paper. ProDiff GitHub Repository, ProDiff Paper.

  • Cite as:

    @inproceedings{huang2022prodiff,
       title={ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech},
       author={Huang, Rongjie and Zhao, Zhou and Liu, Huadai and Liu, Jinglin and Cui, Chenye and Ren, Yi},
       booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
       year={2022}
    
    @inproceedings{huang2022fastdiff,
       title={FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis},
       author={Huang, Rongjie and Lam, Max WY and Wang, Jun and Su, Dan and Yu, Dong and Ren, Yi and Zhao, Zhou},
       booktitle = {Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, {IJCAI-22}},
       year={2022}
    

This model card was written based on the DALL-E Mini model card.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.