bond005's picture
Wav2Vec2 model is updated.
f890604
|
raw
history blame
1.67 kB
metadata
language: ru
datasets:
  - SberDevices/Golos
metrics:
  - wer
  - cer
tags:
  - audio
  - automatic-speech-recognition
  - speech
  - xlsr-fine-tuning-week
license: apache-2.0
model-index:
  - name: XLSR Wav2Vec2 Russian by Ivan Bondarenko
    results:
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Sberdevices Golos (crowd)
          type: SberDevices/Golos
          args: ru
        metrics:
          - name: Test WER
            type: wer
            value: 7.985
          - name: Test CER
            type: cer
            value: 2.014
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: Sberdevices Golos (farfield)
          type: SberDevices/Golos
          args: ru
        metrics:
          - name: Test WER
            type: wer
            value: 19.912
          - name: Test CER
            type: cer
            value: 5.904

Wav2Vec2-Large-Ru-Golos

The Wav2Vec2 model is based on facebook/wav2vec2-large-xlsr-53, fine-tuned in Russian using Sberdevices Golos with audio augmentations like as pitch shift, acceleration/deceleration of sound, reverberation etc.

When using this model, make sure that your speech input is sampled at 16kHz.

Citation

If you want to cite this model you can use this:

@misc{bondarenko2022wav2vec2-large-ru-golos,
  title={XLSR Wav2Vec2 Russian by Ivan Bondarenko},
  author={Bondarenko, Ivan},
  publisher={Hugging Face},
  journal={Hugging Face Hub},
  howpublished={\url{https://huggingface.co/bond005/wav2vec2-large-ru-golos}},
  year={2022}
}