speecht5_finetuned_commonvoice_dv

This model is a fine-tuned version of microsoft/speecht5_tts on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 4000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
7.5307	0.3610	100	0.7546
6.4339	0.7220	200	0.6843
5.934	1.0830	300	0.6298
5.1279	1.4440	400	0.5577
4.8343	1.8051	500	0.5386
4.8162	2.1661	600	0.5292
4.6876	2.5271	700	0.5138
4.5683	2.8881	800	0.5084
4.4605	3.2491	900	0.5039
4.4566	3.6101	1000	0.4947
4.4449	3.9711	1100	0.4926
4.2956	4.3321	1200	0.4838
4.3928	4.6931	1300	0.4851
4.3249	5.0542	1400	0.4861
4.2335	5.4152	1500	0.4786
4.2005	5.7762	1600	0.4797
4.1928	6.1372	1700	0.4770
4.1732	6.4982	1800	0.4709
4.2183	6.8592	1900	0.4692
4.1567	7.2202	2000	0.4714
4.1174	7.5812	2100	0.4656
4.1076	7.9422	2200	0.4631
4.0899	8.3032	2300	0.4644
4.1671	8.6643	2400	0.4632
4.1171	9.0253	2500	0.4650
4.0925	9.3863	2600	0.4636
4.1103	9.7473	2700	0.4611
4.0193	10.1083	2800	0.4595
4.0634	10.4693	2900	0.4590
3.9804	10.8303	3000	0.4600
4.0449	11.1913	3100	0.4579
3.9613	11.5523	3200	0.4565
4.0675	11.9134	3300	0.4577
3.9964	12.2744	3400	0.4541
3.9694	12.6354	3500	0.4585
3.9767	12.9964	3600	0.4579
3.9805	13.3574	3700	0.4556
4.1194	13.7184	3800	0.4556
4.0595	14.0794	3900	0.4540
4.049	14.4404	4000	0.4550