File size: 1,569 Bytes
81793f8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
Field | Response :------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------- Intended Application & Domain: | Generating waveform from mel spectrogram. Model Type: | Convolutional Neural Network (CNN) Intended Users: | This model is intended for developers to synthesize and generate waveforms from the AI-generated mel spectrograms. Output: | Audio Waveform Describe how the model works: | Model generates audio waveform corresponding to the input mel spectrogram. Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable Technical Limitations: | This may not perform well on synthetically-generated mel spectrograms that deviate significantly from the profile of mel spectrograms on which this was trained. Verified to have met prescribed NVIDIA quality standards: | Yes Performance Metrics: | Perceptual Evaluation of Speech Quality (PESQ), Virtual Speech Quality Objective Listener (VISQOL), Multi-resolution STFT (MRSTFT), Mel cepstral distortion (MCD), Periodicity RMSE, Voice/Unvoiced F1 Score (V/UV F1) Potential Known Risks: | This model may generate low-quality or distorted soundwaves. Licensing: | https://github.com/NVIDIA/BigVGAN/blob/main/LICENSE |