What dataset was used for pre-training?
#1
by
carlosfranzreb
- opened
Hello,
Thank you for providing the pre-trained models. We are currently participating in a challenge, where only certain subsets from LibriSpeech may be used. We would therefore would like to know, which subset from LibriSpeech was used to pre-train this model. Was it train-clean-100? Also, is it stated in any place, so we can notify the creators of the challenge if necessary?
Kind regards,
Carlos
Hey @carlosfranzreb ,
This model was pretrained on the whole Librispeech training corpus (train.500 (other) + train.360 (clean) + train.100 (clean)) and fine-tuned only on the 100h of train.