language: | |
- no | |
license: apache-2.0 | |
tags: | |
- whisper-event | |
- norwegian | |
datasets: | |
- NbAiLab/NCC_S | |
- NbAiLab/NPSC | |
- NbAiLab/NST | |
metrics: | |
- wer | |
model-index: | |
- name: Whisper Tiny Norwegian Bokmål | |
results: | |
- task: | |
name: Automatic Speech Recognition | |
type: automatic-speech-recognition | |
dataset: | |
name: FLEURS | |
type: google/fleurs | |
config: nb_no | |
split: validation | |
args: nb_no | |
metrics: | |
- name: Wer | |
type: wer | |
value: 45.73 | |
# Whisper Tiny Norwegian Bokmål | |
This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) trained on several datasets. | |
It is currently in the middle of a large training. Currently it achieves the following results on the evaluation set: | |
- Loss: 1.4616 | |
- Wer: 45.73 | |
## Model description | |
The model is trained on a large corpus of roughly 5.000 hours of voice. The sources are subtitles from the Norwegian broadcaster NRK, transcribed speeches from the Norwegian parliament and voice recordings from Norsk Språkteknologi. | |
## Intended uses & limitations | |
The model will be free for everyone to use when it is finished. | |
### Training hyperparameters | |
The following hyperparameters were used during training: | |
- learning_rate: 3e-06 | |
- train_batch_size: 128 | |
- eval_batch_size: 32 | |
- seed: 42 | |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 | |
- lr_scheduler_type: linear | |
- lr_scheduler_warmup_steps: 1000 | |
- training_steps: 100.000 (currently @5.000) | |
- mixed_precision_training: fp16 | |
### Live Training results | |
See [Tensorboad Metrics](https://huggingface.co/NbAiLab/whisper-tiny-nob/tensorboard) | |