Finnish Wav2vec2-XLarge ASR
GetmanY1/wav2vec2-xlarge-fi-150k fine-tuned on 4600 hours of Finnish speech on 16kHz sampled speech audio:
- 1500 hours of Lahjoita puhetta (Donate Speech) (colloquial Finnish)
- 3100 hours of the Finnish Parliament dataset
When using the model make sure that your speech input is also sampled at 16Khz.
Model description
The Finnish Wav2Vec2 X-Large has the same architecture and uses the same training objective as the multilingual one described in paper.
GetmanY1/wav2vec2-xlarge-fi-150k is a large-scale, 1-billion parameter monolingual model pre-trained on 158k hours of unlabeled Finnish speech, including KAVI radio and television archive materials, Lahjoita puhetta (Donate Speech), Finnish Parliament, Finnish VoxPopuli.
You can read more about the pre-trained model from this paper. The training scripts are available on GitHub.
Intended uses
You can use this model for Finnish ASR (speech-to-text).
How to use
To transcribe audio files the model can be used as a standalone acoustic model as follows:
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
import torch
# load model and processor
processor = Wav2Vec2Processor.from_pretrained("GetmanY1/wav2vec2-xlarge-fi-150k-finetuned")
model = Wav2Vec2ForCTC.from_pretrained("GetmanY1/wav2vec2-xlarge-fi-150k-finetuned")
# load dummy dataset and read soundfiles
ds = load_dataset("mozilla-foundation/common_voice_16_1", "fi", split='test')
# tokenize
input_values = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest").input_values # Batch size 1
# retrieve logits
logits = model(input_values).logits
# take argmax and decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
Team Members
- Yaroslav Getman, Hugging Face profile, LinkedIn profile
- Tamas Grosz, Hugging Face profile, LinkedIn profile
Feel free to contact us for more details 🤗
- Downloads last month
- 14
Model tree for GetmanY1/wav2vec2-xlarge-fi-150k-finetuned
Base model
GetmanY1/wav2vec2-xlarge-fi-150kCollection including GetmanY1/wav2vec2-xlarge-fi-150k-finetuned
Evaluation results
- Dev WER on Lahjoita puhetta (Donate Speech)self-reported14.980
- Dev CER on Lahjoita puhetta (Donate Speech)self-reported4.130
- Test WER on Lahjoita puhetta (Donate Speech)self-reported16.370
- Test CER on Lahjoita puhetta (Donate Speech)self-reported5.030
- Dev16 WER on Finnish Parliamentself-reported10.910
- Dev16 CER on Finnish Parliamentself-reported4.850
- Test16 WER on Finnish Parliamentself-reported7.810
- Test16 CER on Finnish Parliamentself-reported3.480
- Test20 WER on Finnish Parliamentself-reported6.430
- Test20 CER on Finnish Parliamentself-reported2.090