ITG
/

wav2vec2-large-xlsr-gl

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

wav2vec2-large-xlsr-gl / README.md

rgomez-itg's picture

Update README.md

f5bd548 over 1 year ago

|

history blame contribute delete

3.16 kB

	---
	license: cc-by-nc-nd-4.0
	datasets:
	- openslr
	language:
	- gl
	pipeline_tag: automatic-speech-recognition
	tags:
	- ITG
	- PyTorch
	- Transformers
	- wav2vec2
	---

	# Wav2Vec2 Large XLSR Galician

	## Description

	This is a fine-tuned version of the [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) pre-trained model for ASR in galician.

	---

	## Dataset

	The dataset used for fine-tuning this model was the [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77) dataset, available in the openslr repository.

	---

	## Example inference script

	### Check this example script to run our model in inference mode

	```python
	import torch
	from transformers import AutoProcessor, AutoModelForCTC
	filename = "demo.wav" #change this line to the name of your audio file
	sample_rate = 16_000
	processor = AutoProcessor.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
	model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/wav2vec2-large-xlsr-gl')
	device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
	model.to(device)
	speech_array, _ = librosa.load(filename, sr=sample_rate)
	inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt", padding=True).to(device)
	with torch.no_grad():
	logits = model(inputs.input_values, attention_mask=inputs.attention_mask.to(device)).logits
	decode_output = processor.batch_decode(torch.argmax(logits, dim=-1))[0]
	print(f"ASR Galician wav2vec2-large-xlsr output: {decode_output}")
	```
	---

	## Fine-tuning hyper-parameters

	\| Hyper-parameter \| Value \|
	\|:----------------------------------------:\|:---------------------------:\|
	\| Training batch size \| 16 \|
	\| Evaluation batch size \| 8 \|
	\| Learning rate \| 3e-4 \|
	\| Gradient accumulation steps \| 2 \|
	\| Group by length \| true \|
	\| Evaluation strategy \| steps \|
	\| Max training epochs \| 50 \|
	\| Max steps \| 4000 \|
	\| Generate max length \| 225 \|
	\| FP16 \| true \|
	\| Metric for best model \| wer \|
	\| Greater is better \| false \|


	## Fine-tuning in a different dataset or style

	If you're interested in fine-tuning your own wav2vec2 model, we suggest starting with the [facebook/wav2vec2-large-xlsr-53 model](https://huggingface.co/facebook/wav2vec2-large-xlsr-53). Additionally,
	you may find this [fine-tuning on galician notebook by Diego Fustes](https://github.com/diego-fustes/xlsr-fine-tuning-gl/blob/main/Fine_Tune_XLSR_Wav2Vec2_on_Galician.ipynb) to be a valuable resource.
	This guide served as a helpful reference during the training process of this Galician wav2vec2-large-xlsr model!