metadata

language: hr
datasets:
  - parlaspeech
tags:
  - audio
  - automatic-speech-recognition
widget:
  - example_title: example 1
    src: >-
      https://huggingface.co/classla/wav2vec2-xls-r-sabor-hr/raw/main/00020570a.flac.wav
  - example_title: example 2
    src: >-
      https://huggingface.co/classla/wav2vec2-xls-r-sabor-hr/raw/main/00020578b.flac.wav

wav2vec2-xls-r-sabor-hr

This model is based on the facebook/wav2vec2-xls-r-300m model and was fine-tuned over 72 hours of recordings and transcripts from the Croatian parliament. These transcripts are an early result of the second iteration of the ParlaMint project and will be extended and published under a permissive license.

These efforts were coordinated by Nikola Ljubešić, the manual data alignment was performed by Ivo-Pavao Jazbec, the method from Plüss et al was applied by Vuk Batanović and Lenka Bajčetić, while the final modelling was performed by Peter Rupnik.

Initial evaluation on partially noisy data showed the model to achieve a word error rate of 13.68% and a character error rate of 4.56%.