greenw0lf
/

wav2vec2-large-xls-r-1b-frisian

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

wav2vec2-large-xls-r-1b-frisian / README.md

greenw0lf's picture

Update README.md

3d9cf41 over 1 year ago

|

3.55 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	datasets:
	- mozilla-foundation/common_voice_12_0
	metrics:
	- wer
	model-index:
	- name: wav2vec2-large-xls-r-1b-frisian
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: common_voice_12_0
	type: common_voice_12_0
	config: fy-NL
	split: test
	args: fy-NL
	metrics:
	- name: Wer
	type: wer
	value: 0.15990775235054105
	language:
	- fy
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# wav2vec2-large-xls-r-1b-frisian

	This model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on the common_voice_12_0 dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.2634
	- WER: 0.1599

	This model was developed together with [golesheed](https://huggingface.co/golesheed) for the course "Speech Recognition II" of the "MSc Voice Technology" program at Rijksuniversiteit Groningen - Campus Fryslân.

	## Intended uses & limitations

	Intended use is for recognizing Frisian speech.

	Limitations include not enough hyperparameter tuning, no LM rescoring, and using v12 of Common Voice instead of v13.

	## Training and evaluation data

	Training and evaluation splits used are the ones available in the Common Voice dataset.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 8e-05
	- train_batch_size: 16
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 32
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 500
	- num_epochs: 50
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Wer \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:------:\|
	\| 4.7284 \| 2.1 \| 250 \| 2.9453 \| 1.0 \|
	\| 1.7496 \| 4.2 \| 500 \| 0.5141 \| 0.4771 \|
	\| 0.8168 \| 6.3 \| 750 \| 0.3220 \| 0.3148 \|
	\| 0.7403 \| 8.4 \| 1000 \| 0.2988 \| 0.2573 \|
	\| 0.7298 \| 10.5 \| 1250 \| 0.2794 \| 0.2347 \|
	\| 0.6303 \| 12.61 \| 1500 \| 0.2577 \| 0.2164 \|
	\| 0.5201 \| 14.71 \| 1750 \| 0.2746 \| 0.2162 \|
	\| 0.5189 \| 16.81 \| 2000 \| 0.2543 \| 0.2034 \|
	\| 0.5054 \| 18.91 \| 2250 \| 0.2847 \| 0.2071 \|
	\| 0.5112 \| 21.01 \| 2500 \| 0.2772 \| 0.1979 \|
	\| 0.5105 \| 23.11 \| 2750 \| 0.2633 \| 0.1920 \|
	\| 0.5032 \| 25.21 \| 3000 \| 0.2667 \| 0.1856 \|
	\| 0.46 \| 27.31 \| 3250 \| 0.2730 \| 0.1852 \|
	\| 0.4992 \| 29.41 \| 3500 \| 0.2626 \| 0.1782 \|
	\| 0.4535 \| 31.51 \| 3750 \| 0.2778 \| 0.1749 \|
	\| 0.4036 \| 33.61 \| 4000 \| 0.2825 \| 0.1747 \|
	\| 0.3347 \| 35.71 \| 4250 \| 0.2797 \| 0.1708 \|
	\| 0.2708 \| 37.82 \| 4500 \| 0.2662 \| 0.1712 \|
	\| 0.1825 \| 39.92 \| 4750 \| 0.2652 \| 0.1648 \|
	\| 0.1654 \| 42.02 \| 5000 \| 0.2719 \| 0.1628 \|
	\| 0.1387 \| 44.12 \| 5250 \| 0.2552 \| 0.1607 \|
	\| 0.1367 \| 46.22 \| 5500 \| 0.2641 \| 0.1591 \|
	\| 0.1218 \| 48.32 \| 5750 \| 0.2634 \| 0.1598 \|


	### Framework versions

	- Transformers 4.27.3
	- Pytorch 2.0.0+cu117
	- Datasets 2.10.1
	- Tokenizers 0.13.2