speecht5_finetuned_voxpopuli_de

This model is a fine-tuned version of microsoft/speecht5_tts on the voxpopuli dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 40000
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.5233	2.2783	1000	0.4829
0.5036	4.5566	2000	0.4684
0.503	6.8349	3000	0.4616
0.4895	9.1118	4000	0.4577
0.486	11.3901	5000	0.4537
0.4835	13.6684	6000	0.4524
0.4757	15.9467	7000	0.4511
0.4771	18.2236	8000	0.4504
0.4745	20.5019	9000	0.4488
0.474	22.7802	10000	0.4479
0.4697	25.0570	11000	0.4493
0.4673	27.3353	12000	0.4485
0.4716	29.6136	13000	0.4481
0.4651	31.8919	14000	0.4482
0.4699	34.1688	15000	0.4471
0.4613	36.4471	16000	0.4481
0.4655	38.7254	17000	0.4478
0.4601	41.0023	18000	0.4468
0.4602	43.2806	19000	0.4454
0.4613	45.5589	20000	0.4469
0.4606	47.8372	21000	0.4467
0.4546	50.1141	22000	0.4479
0.4545	52.3924	23000	0.4465
0.4556	54.6707	24000	0.4470
0.4578	56.9490	25000	0.4466
0.4564	59.2258	26000	0.4466
0.4566	61.5041	27000	0.4480
0.457	63.7824	28000	0.4470
0.4531	66.0593	29000	0.4493
0.4521	68.3376	30000	0.4478
0.4527	70.6159	31000	0.4488
0.4513	72.8942	32000	0.4479
0.455	75.1711	33000	0.4478
0.4533	77.4494	34000	0.4486
0.4565	79.7277	35000	0.4473
0.452	82.0046	36000	0.4489
0.4523	84.2829	37000	0.4477
0.4523	86.5612	38000	0.4481
0.4536	88.8395	39000	0.4481
0.4512	91.1163	40000	0.4484

Safetensors

Model size

0.1B params

Tensor type

F32

Base model

Finetuned

(1255)

this model