What dataset was used for pre-training?

by carlosfranzreb - opened


Thank you for providing the pre-trained models. We are currently participating in a challenge, where only certain subsets from LibriSpeech may be used. We would therefore would like to know, which subset from LibriSpeech was used to pre-train this model. Was it train-clean-100? Also, is it stated in any place, so we can notify the creators of the challenge if necessary?

Kind regards,

Hey @carlosfranzreb ,

This model was pretrained on the whole Librispeech training corpus (train.500 (other) + train.360 (clean) + train.100 (clean)) and fine-tuned only on the 100h of train.

Sign up or log in to comment