File size: 1,814 Bytes
3f56d20 15de6e8 1c07c81 5f75d6c 1c07c81 5f75d6c c58e1c8 764b020 4d464a0 764b020 1c07c81 3f56d20 15de6e8 3f56d20 1c07c81 3f56d20 15de6e8 3f56d20 1038adc 3f56d20 15de6e8 3f56d20 c58e1c8 3f56d20 15de6e8 3f56d20 c58e1c8 3f56d20 15de6e8 3f56d20 15de6e8 3f56d20 15de6e8 b9c0e02 15de6e8 b9c0e02 15de6e8 3f56d20 15de6e8 3f56d20 15de6e8 c58e1c8 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
---
model-index:
- name: mHuBERT-147-br
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: common_voice_15_0
type: common_voice_15_0
config: br
split: test
args: br
metrics:
- name: WER
type: wer
value: 47.0
- name: CER
type: cer
value: 16.7
language:
- br
metrics:
- wer
base_model: utter-project/mHuBERT-147
pipeline_tag: automatic-speech-recognition
datasets:
- mozilla-foundation/common_voice_15_0
---
# mHuBERT-147-br
This model is a fine-tuned version of [utter-project/mHuBERT-147](https://huggingface.co/utter-project/mHuBERT-147) on Mozilla Common Voice 15 Breton dataset and [Roadennoù](https://github.com/gweltou/roadennou) dataset.
It achieves the following results on the validation set:
- Loss: 0.7331
- Wer: 50.09
- Cer: 16.45
## Model description
This model was trained to assess the performance of mHubert-147 for finetuning a Breton ASR model.
## Intended uses & limitations
This model is a research model and shouldn't be used in production.
## Training and evaluation data
90% of the Roadennoù dataset was used for training, the remaining 10% was used for validation in addition to MCV15-br validation dataset.
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3.8e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 52
- mixed_precision_training: Native AMP
### Framework versions
- Transformers 4.39.1
- Pytorch 2.0.1+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2 |