File size: 1,644 Bytes

f798255
 
 
 
5945ce0
f798255
b82d7ea
f798255
a7fa600
5945ce0
ad11d06
2f61ae4
ad11d06
 
 
 
 
 
 
 
 
 
b82d7ea
 
f798255
 
b82d7ea
f798255
a7fa600
f798255
5f7455c
f798255
a7fa600
f798255
 
 
b82d7ea
f798255
 
 
 
 
 
 
b82d7ea
f798255
 
 
 
 
 
 
a7fa600
f798255
 
 
a7fa600
f798255
 
a7fa600
f798255
 
 
 
 
 
a7fa600
 
 
f798255
 
 
 
a7fa600
f798255
a7fa600
b82d7ea

---
license: mit
base_model: microsoft/speecht5_tts
tags:
- text-to-speech
datasets:
- facebook/voxpopuli
model-index:
- name: speecht5_tts-ft-voxpopuli-it
  results:
  - task:
      type: text-to-speech
    dataset:
      name: facebook/voxpopuli
      type: facebook/voxpopuli
      config: it
      split: train
      args: it
    metrics:
    - name: N.A.
      type: N.A.
      value: N.A.
language:
- it
---



# speecht5_tts-ft-voxpopuli-it

This model is a fine-tuned version of [microsoft/speecht5_tts](https://huggingface.co/microsoft/speecht5_tts) on the facebook/voxpopuli dataset.
It achieves the following results on the evaluation set:
- Loss: 0.5126

## Model description

It uses the speaker embedding model speechbrain/spkrec-xvect-voxceleb

## Intended uses & limitations

More information needed

## Training and evaluation data

test_size=0.15

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 300
- training_steps: 1000

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.6118        | 1.94  | 300  | 0.5508          |
| 0.5729        | 3.89  | 600  | 0.5204          |
| 0.563         | 5.83  | 900  | 0.5126          |


### Framework versions

- Transformers 4.33.0
- Pytorch 1.12.1+cu116
- Datasets 2.14.4
- Tokenizers 0.12.1