File size: 1,644 Bytes
f798255 5945ce0 f798255 b82d7ea f798255 a7fa600 5945ce0 ad11d06 2f61ae4 ad11d06 b82d7ea f798255 b82d7ea f798255 a7fa600 f798255 5f7455c f798255 a7fa600 f798255 b82d7ea f798255 b82d7ea f798255 a7fa600 f798255 a7fa600 f798255 a7fa600 f798255 a7fa600 f798255 a7fa600 f798255 a7fa600 b82d7ea |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
---
license: mit
base_model: microsoft/speecht5_tts
tags:
- text-to-speech
datasets:
- facebook/voxpopuli
model-index:
- name: speecht5_tts-ft-voxpopuli-it
results:
- task:
type: text-to-speech
dataset:
name: facebook/voxpopuli
type: facebook/voxpopuli
config: it
split: train
args: it
metrics:
- name: N.A.
type: N.A.
value: N.A.
language:
- it
---
# speecht5_tts-ft-voxpopuli-it
This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) on the facebook/voxpopuli dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5126
## Model description
It uses the speaker embedding model speechbrain/spkrec-xvect-voxceleb
## Intended uses & limitations
More information needed
## Training and evaluation data
test_size=0.15
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 300
- training_steps: 1000
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.6118 | 1.94 | 300 | 0.5508 |
| 0.5729 | 3.89 | 600 | 0.5204 |
| 0.563 | 5.83 | 900 | 0.5126 |
### Framework versions
- Transformers 4.33.0
- Pytorch 1.12.1+cu116
- Datasets 2.14.4
- Tokenizers 0.12.1 |