|
--- |
|
license: other |
|
license_name: coqui-public-model-license |
|
license_link: https://coqui.ai/cpml |
|
library_name: coqui |
|
pipeline_tag: text-to-speech |
|
datasets: |
|
- ylacombe/google-argentinian-spanish |
|
language: |
|
- es |
|
--- |
|
# ⓍTTS 🇦🇷 |
|
ⓍTTS is a Voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip. There is no need for an excessive amount of training data that spans countless hours. |
|
|
|
This model was trained by IdeaLab in [CITECCA](https://mapatecnologico.rionegro.gov.ar/detail/citecca-centro-interdisciplinario-de-telecomunicaciones-electronica-computacion-y-ciencia-aplicada-unrn), in the [Universidad Nacional de Rio Negro](https://www.unrn.edu.ar/home) |
|
|
|
### Language |
|
This model's Spanish language has been finetuned using [ylacombe's google argentinian spanish dataset](https://huggingface.co/datasets/ylacombe/google-argentinian-spanish) to archieve an argentinian accent. |
|
|
|
### Training Parameters |
|
|
|
``` |
|
batch_size=8, |
|
grad_accum_steps=96, |
|
batch_group_size=48, |
|
eval_batch_size=8, |
|
num_loader_workers=8, |
|
eval_split_max_size=256, |
|
optimizer="AdamW", |
|
optimizer_wd_only_on_weights=True, |
|
optimizer_params={"betas": [0.9, 0.96], "eps": 1e-8, "weight_decay": 1e-2}, |
|
lr=5e-06, |
|
lr_scheduler="MultiStepLR", |
|
lr_scheduler_params={"milestones": [50000 * 18, 150000 * 18, 300000 * 18], "gamma": 0.5, "last_epoch": -1}, |
|
``` |
|
|
|
### License |
|
This model is licensed under [Coqui Public Model License](https://coqui.ai/cpml). There's a lot that goes into a license for generative models, and you can read more of [the origin story of CPML here](https://coqui.ai/blog/tts/cpml). |
|
|
|
Using 🐸TTS Command line: |
|
|
|
```console |
|
tts --model_name /path/to/xtts/ \ |
|
--text "Che boludo, vamos a tomar unos mates." \ |
|
--speaker_wav /path/to/target/speaker.wav \ |
|
--language_idx es \ |
|
--use_cuda true |
|
``` |
|
|
|
Using the model directly: |
|
|
|
```python |
|
from TTS.tts.configs.xtts_config import XttsConfig |
|
from TTS.tts.models.xtts import Xtts |
|
|
|
config = XttsConfig() |
|
config.load_json("/path/to/xtts/config.json") |
|
model = Xtts.init_from_config(config) |
|
model.load_checkpoint(config, checkpoint_dir="/path/to/xtts/", eval=True) |
|
model.cuda() |
|
|
|
outputs = model.synthesize( |
|
"Che boludo, vamos a tomar unos mates.", |
|
config, |
|
speaker_wav="/data/TTS-public/_refclips/3.wav", |
|
gpt_cond_len=3, |
|
language="es", |
|
) |
|
``` |