metadata

language: sr
datasets:
  - juznevesti-sr
tags:
  - audio
  - automatic-speech-recognition
widget:
  - example_title: Croatian example 1
    src: >-
      https://huggingface.co/classla/wav2vec2-xls-r-parlaspeech-hr/raw/main/1800.m4a
  - example_title: Croatian example 2
    src: >-
      https://huggingface.co/classla/wav2vec2-xls-r-parlaspeech-hr/raw/main/00020578b.flac.wav
  - example_title: Croatian example 3
    src: >-
      https://huggingface.co/classla/wav2vec2-xls-r-parlaspeech-hr/raw/main/00020570a.flac.wav

wav2vec2-large-juznevesti

This model for Serbian ASR is based on the facebook/wav2vec2-large-slavic-voxpopuli-v2 model and was fine-tuned with 58 hours of audio and transcripts from Južne vesti, programme '15 minuta'.

Metrics

Evaluation is performed on the dev and test portions of the JuzneVesti dataset

	dev	test
WER	0.295206	0.290094
CER	0.140766	0.137642

Usage in `transformers`

Tested with transformers==4.18.0, torch==1.11.0, and SoundFile==0.10.3.post1.

from transformers import Wav2Vec2ProcessorWithLM, Wav2Vec2ForCTC
import soundfile as sf
import torch
import os
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# load model and tokenizer
processor = Wav2Vec2ProcessorWithLM.from_pretrained(
    "classla/wav2vec2-large-slavic-parlaspeech-hr-lm")
model = Wav2Vec2ForCTC.from_pretrained("classla/wav2vec2-large-slavic-parlaspeech-hr-lm")
# download the example wav files:
os.system("wget https://huggingface.co/classla/wav2vec2-large-slavic-parlaspeech-hr-lm/raw/main/00020570a.flac.wav")
# read the wav file 
speech, sample_rate = sf.read("00020570a.flac.wav")
input_values = processor(speech, sampling_rate=sample_rate, return_tensors="pt").input_values.cuda()
inputs = processor(speech, sampling_rate=sample_rate, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
transcription = processor.batch_decode(logits.numpy()).text[0]

# remove the raw wav file
os.system("rm 00020570a.flac.wav")

transcription # 'velik broj poslovnih subjekata poslao je sa minusom velik dio'

Training hyperparameters

In fine-tuning, the following arguments were used:

arg	value
`per_device_train_batch_size`	16
`gradient_accumulation_steps`	4
`num_train_epochs`	8
`learning_rate`	3e-4
`warmup_steps`	500

wav2vec2-large-juznevesti

Metrics

Usage in transformers

Training hyperparameters

Usage in `transformers`