What dataset was used for pre-training?

by carlosfranzreb - opened Jun 24, 2022

Jun 24, 2022

Hello,

Thank you for providing the pre-trained models. We are currently participating in a challenge, where only certain subsets from LibriSpeech may be used. We would therefore would like to know, which subset from LibriSpeech was used to pre-train this model. Was it train-clean-100? Also, is it stated in any place, so we can notify the creators of the challenge if necessary?

Kind regards,
Carlos

patrickvonplaten

Jun 25, 2022

Hey @carlosfranzreb ,

This model was pretrained on the whole Librispeech training corpus (train.500 (other) + train.360 (clean) + train.100 (clean)) and fine-tuned only on the 100h of train.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment