classla
/

wav2vec2-xls-r-parlaspeech-hr

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

5roop commited on Dec 22, 2021

Commit

cce5842

•

1 Parent(s): e84eba9

Added an use example

Files changed (1) hide show

README.md +41 -1

README.md CHANGED Viewed

@@ -19,4 +19,44 @@ This model is based on the [facebook/wav2vec2-xls-r-300m model](https://huggingf
 The efforts resulting with this model were coordinated by Nikola Ljubešić, the rough manual data alignment was performed by Ivo-Pavao Jazbec, the method for fine automatic data alignment from [Plüss et al.](https://arxiv.org/abs/2010.02810) was applied by Vuk Batanović and Lenka Bajčetić, while the final modelling was performed by Peter Rupnik.
-Initial evaluation on partially noisy data showed the model to achieve a word error rate of 13.68% and a character error rate of 4.56%.

 The efforts resulting with this model were coordinated by Nikola Ljubešić, the rough manual data alignment was performed by Ivo-Pavao Jazbec, the method for fine automatic data alignment from [Plüss et al.](https://arxiv.org/abs/2010.02810) was applied by Vuk Batanović and Lenka Bajčetić, while the final modelling was performed by Peter Rupnik.
+Initial evaluation on partially noisy data showed the model to achieve a word error rate of 13.68% and a character error rate of 4.56%.
+## Usage in `transformers`
+```python
+from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
+from datasets import Audio
+import soundfile as sf
+import torch
+import os
+# load model and tokenizer
+processor = Wav2Vec2Processor.from_pretrained(
+    "classla/wav2vec2-xls-r-sabor-hr")
+model = Wav2Vec2ForCTC.from_pretrained("classla/wav2vec2-xls-r-sabor-hr")
+# download the example wav files:
+os.system("curl https://huggingface.co/classla/wav2vec2-xls-r-sabor-hr/raw/main/00020570a.flac.wav")
+# read the wav file as datasets.Audio object
+audio = Audio(sampling_rate=16000).decode_example("00020570a.flac.wav")
+# remove the raw wav file
+os.system("rm 00020570a.flac.wav")
+# tokenize
+input_values = processor(
+        audio["array"],  return_tensors="pt", padding=True,
+        sampling_rate=16000).input_values
+# retrieve logits
+logits = model(input_values).logits
+# take argmax and decode
+predicted_ids = torch.argmax(logits, dim=-1)
+transcription = processor.batch_decode(predicted_ids)
+# transcription: ['veliki broj poslovnih subjekata posluje sa minusom velik dio']
+```