speecht5_finetuned_voxpopuli_de
This model is a fine-tuned version of microsoft/speecht5_tts on the voxpopuli dataset. It achieves the following results on the evaluation set:
- Loss: 0.4484
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 40000
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.5233 | 2.2783 | 1000 | 0.4829 |
0.5036 | 4.5566 | 2000 | 0.4684 |
0.503 | 6.8349 | 3000 | 0.4616 |
0.4895 | 9.1118 | 4000 | 0.4577 |
0.486 | 11.3901 | 5000 | 0.4537 |
0.4835 | 13.6684 | 6000 | 0.4524 |
0.4757 | 15.9467 | 7000 | 0.4511 |
0.4771 | 18.2236 | 8000 | 0.4504 |
0.4745 | 20.5019 | 9000 | 0.4488 |
0.474 | 22.7802 | 10000 | 0.4479 |
0.4697 | 25.0570 | 11000 | 0.4493 |
0.4673 | 27.3353 | 12000 | 0.4485 |
0.4716 | 29.6136 | 13000 | 0.4481 |
0.4651 | 31.8919 | 14000 | 0.4482 |
0.4699 | 34.1688 | 15000 | 0.4471 |
0.4613 | 36.4471 | 16000 | 0.4481 |
0.4655 | 38.7254 | 17000 | 0.4478 |
0.4601 | 41.0023 | 18000 | 0.4468 |
0.4602 | 43.2806 | 19000 | 0.4454 |
0.4613 | 45.5589 | 20000 | 0.4469 |
0.4606 | 47.8372 | 21000 | 0.4467 |
0.4546 | 50.1141 | 22000 | 0.4479 |
0.4545 | 52.3924 | 23000 | 0.4465 |
0.4556 | 54.6707 | 24000 | 0.4470 |
0.4578 | 56.9490 | 25000 | 0.4466 |
0.4564 | 59.2258 | 26000 | 0.4466 |
0.4566 | 61.5041 | 27000 | 0.4480 |
0.457 | 63.7824 | 28000 | 0.4470 |
0.4531 | 66.0593 | 29000 | 0.4493 |
0.4521 | 68.3376 | 30000 | 0.4478 |
0.4527 | 70.6159 | 31000 | 0.4488 |
0.4513 | 72.8942 | 32000 | 0.4479 |
0.455 | 75.1711 | 33000 | 0.4478 |
0.4533 | 77.4494 | 34000 | 0.4486 |
0.4565 | 79.7277 | 35000 | 0.4473 |
0.452 | 82.0046 | 36000 | 0.4489 |
0.4523 | 84.2829 | 37000 | 0.4477 |
0.4523 | 86.5612 | 38000 | 0.4481 |
0.4536 | 88.8395 | 39000 | 0.4481 |
0.4512 | 91.1163 | 40000 | 0.4484 |
Framework versions
- Transformers 4.57.0
- Pytorch 2.8.0+cu126
- Datasets 3.6.0
- Tokenizers 0.22.1
- Downloads last month
- 308
Model tree for SverreNystad/speecht5_finetuned_voxpopuli_de
Base model
microsoft/speecht5_tts