File size: 6,205 Bytes
1504035 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
---
license: apache-2.0
language:
- en
tags:
- ai
- rvc
- vc
- voice-cloning
- applio
- titan
- pretrained
datasets:
- blaise-tk/TITAN-Medium
pipeline_tag: audio-to-audio
---
# TITAN: A Versatile, Robust, and High-Quality Pretrained Model for Retrieval-based Voice Conversion (RVC) Training
## Overview
TITAN is a state-of-the-art pretrained model designed for Retrieval-based Voice Conversion (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/) training. It offers a robust solution for transforming voice characteristics from one speaker to another, providing high-quality results with minimal training effort.
## Model Details
### Titan-Medium
- Training Environment: Utilized a RTX 3060 TI on Applio v3.1.1 (https://github.com/IAHispano/Applio), employing a batch size of 8 over a span of 3 weeks.
- Iterations (40k): 1010588 Steps and 467 Epochs
- Iterations (32k): 1001469 Steps and 463 Epochs
- Sampling rate: 48k (still training), 40k, 32k
- Fine-tuning Process: RVC v2 pretrained with pitch guidance, leveraging an 11.15-hour dataset sourced from Expresso (https://arxiv.org/abs/2308.05725) also available on [datasets/blaise-tk/TITAN-Medium](https://huggingface.co/datasets/blaise-tk/TITAN-Medium).
#### Samples
*Tests performed with a premature ckpt at ~700k steps doing all tests under the same conditions.*
<table style="width:100%; text-align:center;">
<tr>
<th>Titan-Medium</th>
<th>Ov2</th>
<th>Ov2.1</th>
</tr>
<tr>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 1 - Test 1 - Titan.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 1 - Test 1 - Ov2.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
</tr>
</tr>
<tr>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 1 - Test 2 - Titan.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 1 - Test 2 - Ov2.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
</tr>
<tr>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 2 - Test 1 - Titan.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 2 - Test 1 - Ov2.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
</tr>
<tr>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 2 - Test 2 - Titan.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 2 - Test 2 - Ov2.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
</tr>
</tr>
<tr>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 3 - Test 1 - Titan.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 3 - Test 1 - Ov2.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 3 - Test 1 - Ov2.1.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
</tr>
</tr>
<tr>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 3 - Test 2 - Titan.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 3 - Test 2 - Ov2.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
<td>
<audio controls>
<source src="https://huggingface.co/blaise-tk/TITAN/resolve/main/demos/Model 3 - Test 2 - Ov2.1.wav?download=true" type="audio/wav">
Your browser does not support the audio element.
</audio>
</td>
</tr>
</table>
### Titan-Large
- Details forthcoming...
## Collaborators
We appreciate the contributions of our collaborators who have helped in the development and refinement of TITAN.
- Mustar
- SimplCup
- UnitedShoes
## Beta Testers
We extend our gratitude to the beta testers who provided valuable feedback during the testing phase of TITAN.
- SimplCup
- Leo_Frixi
- Light
- SCRFilms
- Ryanz
- Litsa_the_dancer
## Citation
Should you find TITAN beneficial for your research endeavors or projects, we kindly request citing our repository:
```
@article{titan,
title={TITAN: A Versatile, Robust, and High-Quality Pretrained Model for Retrieval-based Voice Conversion (RVC) Training},
author={Blaise},
journal={Hugging Face},
year={2024},
publisher={Blaise},
url={https://huggingface.co/blaise-tk/TITAN/}
}
```
|