speecht5_finetuned_voxpopuli_nl

This model is a fine-tuned version of microsoft/speecht5_tts on the voxpopuli dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 4000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.7736	0.4303	100	0.6613
0.6957	0.8607	200	0.6016
0.6169	1.2926	300	0.5396
0.5676	1.7230	400	0.5130
0.553	2.1549	500	0.5001
0.547	2.5853	600	0.4945
0.5431	3.0172	700	0.4885
0.5255	3.4476	800	0.4844
0.5233	3.8779	900	0.4812
0.5188	4.3098	1000	0.4791
0.5188	4.7402	1100	0.4760
0.5086	5.1721	1200	0.4747
0.5105	5.6025	1300	0.4717
0.5167	6.0344	1400	0.4721
0.5074	6.4648	1500	0.4691
0.507	6.8951	1600	0.4684
0.5037	7.3271	1700	0.4676
0.5071	7.7574	1800	0.4674
0.5028	8.1893	1900	0.4654
0.4956	8.6197	2000	0.4642
0.5035	9.0516	2100	0.4638
0.4995	9.4820	2200	0.4649
0.5023	9.9123	2300	0.4624
0.493	10.3443	2400	0.4621
0.4987	10.7746	2500	0.4612
0.4959	11.2066	2600	0.4609
0.4958	11.6369	2700	0.4601
0.4963	12.0689	2800	0.4609
0.4921	12.4992	2900	0.4599
0.4922	12.9295	3000	0.4595
0.4906	13.3615	3100	0.4600
0.49	13.7918	3200	0.4594
0.4883	14.2238	3300	0.4592
0.4895	14.6541	3400	0.4598
0.4934	15.0861	3500	0.4595
0.4918	15.5164	3600	0.4585
0.4893	15.9467	3700	0.4585
0.4956	16.3787	3800	0.4587
0.4889	16.8090	3900	0.4579
0.4917	17.2410	4000	0.4578