File size: 1,369 Bytes
cc9c13f
e544fda
cc9c13f
ca53d26
e544fda
88ae97c
e544fda
 
 
 
cc9c13f
 
c1bd604
e544fda
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
---
language: it
tags:
- audio
- automatic-speech-recognition
- voxpopuli-v2
datasets:
- voxpopuli
license: cc-by-nc-4.0
inference: false
---

# Wav2Vec2-base-VoxPopuli-V2

[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) base model pretrained only in **it** on **21.9k** unlabeled datat of the [VoxPopuli corpus](https://arxiv.org/abs/2101.00390).

The model is pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.

**Note**: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for **speech recognition**, a tokenizer should be created and the model should be fine-tuned on labeled text data in **it**. Check out [this blog](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) for a more in-detail explanation of how to fine-tune the model. 

**Paper**: *[VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation
Learning, Semi-Supervised Learning and Interpretation](https://arxiv.org/abs/2101.00390)*

**Authors**: *Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux* from *Facebook AI*.

See the official website for more information, [here](https://github.com/facebookresearch/voxpopuli/).