|
--- |
|
license: cc-by-nc-nd-4.0 |
|
datasets: |
|
- openslr |
|
language: |
|
- gl |
|
pipeline_tag: automatic-speech-recognition |
|
tags: |
|
- ITG |
|
- PyTorch |
|
- Transformers |
|
- wav2vec2 |
|
--- |
|
|
|
# Wav2Vec2 Large XLSR Galician |
|
|
|
## Description |
|
|
|
This is a fine-tuned version of the [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) pre-trained model for ASR in galician. |
|
|
|
--- |
|
|
|
## Dataset |
|
|
|
The dataset used for fine-tuning this model was the [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77) dataset, available in the openslr repository. |
|
|
|
--- |
|
|
|
## Example inference script |
|
|
|
### Check this example script to run our model in inference mode |
|
|
|
```python |
|
import torch |
|
from transformers import AutoProcessor, AutoModelForCTC |
|
filename = "demo.wav" #change this line to the name of your audio file |
|
sample_rate = 16_000 |
|
processor = AutoProcessor.from_pretrained('ITG/wav2vec2-large-xlsr-gl') |
|
model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/wav2vec2-large-xlsr-gl') |
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
|
model.to(device) |
|
speech_array, _ = librosa.load(filename, sr=sample_rate) |
|
inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt", padding=True).to(device) |
|
with torch.no_grad(): |
|
logits = model(inputs.input_values, attention_mask=inputs.attention_mask.to(device)).logits |
|
decode_output = processor.batch_decode(torch.argmax(logits, dim=-1))[0] |
|
print(f"ASR Galician wav2vec2-large-xlsr output: {decode_output}") |
|
``` |
|
--- |
|
|
|
## Fine-tuning hyper-parameters |
|
|
|
| **Hyper-parameter** | **Value** | |
|
|:----------------------------------------:|:---------------------------:| |
|
| Training batch size | 16 | |
|
| Evaluation batch size | 8 | |
|
| Learning rate | 3e-4 | |
|
| Gradient accumulation steps | 2 | |
|
| Group by length | true | |
|
| Evaluation strategy | steps | |
|
| Max training epochs | 50 | |
|
| Max steps | 4000 | |
|
| Generate max length | 225 | |
|
| FP16 | true | |
|
| Metric for best model | wer | |
|
| Greater is better | false | |
|
|
|
|
|
## Fine-tuning in a different dataset or style |
|
|
|
If you're interested in fine-tuning your own wav2vec2 model, we suggest starting with the [facebook/wav2vec2-large-xlsr-53 model](https://huggingface.co/facebook/wav2vec2-large-xlsr-53). Additionally, |
|
you may find this [fine-tuning on galician notebook by Diego Fustes](https://github.com/diego-fustes/xlsr-fine-tuning-gl/blob/main/Fine_Tune_XLSR_Wav2Vec2_on_Galician.ipynb) to be a valuable resource. |
|
This guide served as a helpful reference during the training process of this Galician wav2vec2-large-xlsr model! |
|
|
|
|