metadata

language: mt
tags:
  - audio
  - automatic-speech-recognition
  - voxpopuli-v2
datasets:
  - voxpopuli
license: cc-by-nc-4.0
inference: false

Wav2Vec2-large-VoxPopuli-V2

Facebook's Wav2Vec2 large model pretrained only in mt on 9.1 unlabeled datat of the VoxPopuli corpus.

The model is pretrained on 16kHz sampled speech audio. When using the model make sure that your speech input is also sampled at 16Khz.

Note: This model does not have a tokenizer as it was pretrained on audio alone. In order to use this model for speech recognition, a tokenizer should be created and the model should be fine-tuned on labeled text data in mt. Check out this blog for a more in-detail explanation of how to fine-tune the model.

Paper: VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation

Authors: Changhan Wang, Morgane Riviere, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux from Facebook AI.

See the official website for more information, here.