patrickvonplaten commited on
Commit
ed0ae96
1 Parent(s): a6994b7
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -13,13 +13,13 @@ license: cc-by-nc-4.0
13
 
14
  [Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained only in **bg** on **17.6k** unlabeled datat of the [VoxPopuli corpus](https://arxiv.org/abs/2101.00390).
15
 
16
- The large model pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
17
 
18
  **Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data in **bg**. Check out [this blog](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) for a more in-detail explanation of how to fine-tune the model.
19
 
20
  **Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
21
  Learning, Semi-Supervised Learning and Interpretation](https://arxiv.org/abs/2101.00390)*
22
 
23
- **Authors**: *Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux* from *Facebook AI*
24
 
25
- See the official website for more information, [here](https://github.com/facebookresearch/voxpopuli/)
 
13
 
14
  [Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained only in **bg** on **17.6k** unlabeled datat of the [VoxPopuli corpus](https://arxiv.org/abs/2101.00390).
15
 
16
+ The model is pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.
17
 
18
  **Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data in **bg**. Check out [this blog](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) for a more in-detail explanation of how to fine-tune the model.
19
 
20
  **Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
21
  Learning, Semi-Supervised Learning and Interpretation](https://arxiv.org/abs/2101.00390)*
22
 
23
+ **Authors**: *Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux* from *Facebook AI*.
24
 
25
+ See the official website for more information, [here](https://github.com/facebookresearch/voxpopuli/).