|
[å人ãããžã§ã¯ã](https://github.com/Respaired/Project-Kanade)ã®äžéšã§ãæ¥æ¬èªSpeechåéã®ãããªãçºå±ã«çŠç¹ãåœãŠãŠããŸãã |
|
|
|
- **Tsukasa** (24kHz)ã®HuggingFaceã¹ããŒã¹ã䜿çšããŠãã ãã: [![huggingface](https://img.shields.io/badge/Interactive_Demo-HuggingFace-yellow)](https://huggingface.co/spaces/Respair/Shiki) |
|
- **Tsumugi** (48kHz)ã®HuggingFaceã¹ããŒã¹: [![huggingface](https://img.shields.io/badge/Interactive_Demo-HuggingFace-yellow)](https://huggingface.co/spaces/Respair/Shiki) |
|
|
|
- Shoukan labã®DiscordãµãŒããŒã«åå ããŠãã ãããç§ããã蚪ããå±
å¿å°ã®è¯ãå Žæã§ã -> [![Discord](https://img.shields.io/discord/1197679063150637117?logo=discord&logoColor=white&label=Join%20our%20Community)](https://discord.gg/JrPSzdcM) |
|
|
|
## ããã¯äœ? |
|
|
|
*泚æ*: ãã®ã¢ãã«ã¯æ¥æ¬èªã®ã¿ããµããŒãããŠããŸãããGradioãã¢ã§ããŒãåããŸãããŒãåãšæ®éã®æ¥æ¬èªãããã¯ã¹ããããã¹ããå
¥åããããšãã§ããŸãã |
|
|
|
ããã¯ã¹ããŒãçæãããã¯ãŒã¯ã§ãçæãããé³å£°ã®è¡šçŸåãšå¶åŸ¡æ§ãæ倧åããããšãç®çãšããŠããŸãããã®äžæ žã«ããã®ã¯[StyleTTS 2](https://github.com/yl4579/StyleTTS2)ã®ã¢ãŒããã¯ãã£ã§ã以äžã®ãããªå€æŽãå ããããŠããŸã: |
|
|
|
- å®å
šã«æ°ããããŒã¿ååŠçãã€ãã©ã€ã³ã®å°å
¥ |
|
- éåžžã®PyTorch LSTMã¬ã€ã€ãŒã§ã¯ãªãmLSTMã¬ã€ã€ãŒãæ¡çšããããã¹ãããã³ãããœãã£ãšã³ã³ãŒããŒã®å®¹éãé«ããããã«ãã©ã¡ãŒã¿ãŒæ°ãå¢ãããŠãã |
|
- PL-BertãPitch ExtractorãText AlignerãäžããååŠç¿ |
|
- SLMã«ã¯WavLMã®ä»£ããã«Whisperã®ãšã³ã³ãŒããŒãäœ¿çš |
|
- 48kHzã®èšå® |
|
- éèšèªãµãŠã³ã(æºæ¯ãããŒãºãªã©)ãç¬ã声ã®è¡šçŸåãåäž |
|
- ã¹ã¿ã€ã«ãã¯ãã«ã®ãµã³ããªã³ã°ã®æ°ããæ¹æ³ |
|
- ããã³ããå¯èœãªé³å£°åæ |
|
- ããŒãåã®å
¥åãæ¥æ¬èªãšããŒãåã®æ··åšã«å¯Ÿå¿ããè³¢ããã©ãããŒãŒã·ã§ã³ã¢ã«ãŽãªãºã |
|
- DDP(Distributed Data Parallel)ãšBF16(Bfloat16)ã®èšç·Žãä¿®æ£ããã(ã»ãšãã©!) |
|
|
|
2ã€ã®ãã§ãã¯ãã€ã³ãã䜿çšã§ããŸããTsukasa and Tsumugi(仮称)ã§ãã |
|
|
|
Tsukasaã¯çŽ800æéã®ããŒã¿ã§åŠç¿ãããŠããŸããäž»ã«ã²ãŒã ãããã«ããã®ããŒã¿ã§ãäžéšã¯ãã©ã€ããŒãããŒã¿ã»ããããã®ãã®ã§ãã |
|
ãã®ãããæ¥æ¬èªã¯ãã¢ãã¡æ¥æ¬èªã(å®éã®æ¥åžžäŒè©±ãšã¯ç°ãªã)ã«ãªããŸãã |
|
|
|
Tsumugi(仮称)ã¯ããã® ããŒã¿ã®äžéšçŽ300æéã䜿çšããããã«æåã¯ãªãŒãã³ã°ã泚éä»ããè¡ã£ãå¶åŸ¡ãããæ¹æ³ã§åŠç¿ãããŠããŸãã |
|
|
|
æ®å¿µãªãããTsumugiã®ã³ã³ããã¹ãé·ã¯å¶éãããŠãããããã€ã³ãããŒã·ã§ã³ã®åŠçã¯Tsukasaã»ã©è¯ããããŸããã |
|
ãŸããKotodamaã®ã€ã³ãã¡ã¬ã³ã¹ã®æåã®ã¢ãŒããããµããŒãããŠããªãããããã€ã¹ãã¶ã€ã³ã¯ã§ããŸããã |
|
|
|
|
|
æäŸ: |
|
|
|
- Soshyant (ç§) |
|
- Auto Meta (Alignment AI Lab) |
|
- Cryptowooser |
|
- Buttercream |
|
|
|
## ãªãéèŠãªã®ã§ãã? |
|
|
|
æè¿ããã倧èŠæš¡ãªã¢ãã«ãžã®åŸåããããŸãããç§ã¯éã®éãè¡ããæ¢åã®ããŒã«ã掻çšããããšã§éçãŸã§æ§èœãåŒãäžããããšãè©Šã¿ãŠããŸãã |
|
ã¹ã±ãŒã«ãé«ããªããŠãããçµæãåŸããããããããªãããšãè©ŠããŠããŸãã |
|
|
|
æ¥æ¬èªã«é¢é£ããããã€ãã®äºé
ããããŸããäŸãã°ããã®èšèªã®ã€ã³ãããŒã·ã§ã³ãã©ã®ããã«æ¹åã§ããããæèã«ãã£ãŠç¶Žããå€ããæç« ãã©ã®ããã«æ£ç¢ºã«æ³šéä»ãã§ããããªã©ã§ãã |
|
|
|
## 䜿ãæ¹ |
|
|
|
# Inference: |
|
|
|
Gradioãã¢: |
|
```bash |
|
python app_tsuka.py |
|
``` |
|
|
|
ãŸãã¯ãæšè«ããŒãããã¯ããã§ãã¯ããŠãã ããããã®åã«ã**éèŠãªæ³šæäºé
**ã»ã¯ã·ã§ã³ãããèªãã§ãã ããã |
|
|
|
# Training: |
|
|
|
第1段é: |
|
```bash |
|
accelerate launch train_first.py --config_path ./Configs/config.yml |
|
``` |
|
第2段é **(DDPããŒãžã§ã³ãåäœããªããããçŸåšã®ããŒãžã§ã³ã§ã¯DPã䜿çšããŠããŸãã[#7](https://github.com/yl4579/StyleTTS2/issues/7)ãåç
§ããŠããã«ãããé¡ãããŸã)**: |
|
```bash |
|
accelerate launch accelerate_train_second.py --config_path ./Configs/config.yml |
|
``` |
|
|
|
SLMã®å
±åTrainã¯ãã«ãGPUã§ã¯æ©èœããŸããã(ãããããã®æ®µéãè¡ãã®ã¯å¿
èŠãã©ããèªäœãçåã§ããç§ã䜿çšããŠããŸããã) |
|
|
|
ãŸãã¯: |
|
|
|
```bash |
|
launch train_first.py --config_path ./Configs/config.yml |
|
``` |
|
|
|
第3段é(Kotodamaãããã³ãããšã³ã³ãŒãã£ã³ã°ãªã©): |
|
*æªäºå®* |
|
|
|
|
|
## ä»åŸã®æ¹åæ¡ |
|
|
|
ããã€ãã®æ¹åç¹ãèããããŸããå¿
ãããç§ãåãçµãããã§ã¯ãããŸããããææ¡ãšããŠæããŠãã ãã: |
|
|
|
- [o] ãã³ãŒããŒã®å€æŽ(å
·äœçã«[fregrad](https://github.com/kaistmm/fregrad)ãé¢çœããã) |
|
- [o] å¥ã®ã¢ã«ãŽãªãºã ã䜿ã£ãŠPitch Extractorãåèšç·Ž |
|
- [o] éé³å£°ãµãŠã³ãã®çæã¯æ¹åãããŸããããå®å
šãªéé³å£°åºåã¯çæã§ããŸãããããã¯ãhard alignmentã®åœ±é¿ãããããŸããã |
|
- [o] ã¹ã¿ã€ã«ãšã³ã³ãŒããŒãå¥ã®ã¢ããªãã£ãšããŠLLMsã§äœ¿çšãã(Style-Talkerã«äŒŒãã¢ãããŒã) |
|
|
|
## åææ¡ä»¶ |
|
1. Python >= 3.11 |
|
2. ãã®ãªããžããªãã¯ããŒã³ããŸã: |
|
```bash |
|
git clone https://github.com/yl4579/StyleTTS2.git |
|
cd StyleTTS2 |
|
``` |
|
3. Pythonã®èŠä»¶ãã€ã³ã¹ããŒã«ããŸã: |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
## èšç·Žã®è©³çŽ° |
|
|
|
- 8x A40s + 2x V100s(32GBãã€) |
|
- 750 ~ 800æéã®ããŒã¿ |
|
- Bfloat16 |
|
- çŽ3é±éã®èšç·Žãå
šäœã§3ã¶æ(ããŒã¿ãã€ãã©ã€ã³ã®äœæ¥ãå«ã) |
|
- Google CloudããŒã¹ã§æŠç®ãããšã66.6 kg CO2eq.ã®äºé
žåççŽ æåº(Google Cloudã¯äœ¿çšããŠããŸããããã¯ã©ã¹ã¿ãŒãã¢ã¡ãªã«ã«ãããããéåžžã«å€§ãŸããªæšå®ã§ãã) |
|
|
|
|
|
### éèŠãªæ³šæäºé
|
|
|
|
1. ãã£ãã¥ãŒãžã§ã³ãµã³ãã©ãŒãæå¹ã«ãããšåºåãå£ããŸã: |
|
|
|
***æ®å¿µãªãããããŒããŠã§ã¢ã«ãã£ãŠã¯ããã£ãã¥ãŒãžã§ã³ãµã³ãã©ãŒãæ©èœããªãå¯èœæ§ããããŸãããã®åé¡ã¯ç§ã®ç®¡çå€ã§ãç°ãªãããŒããŠã§ã¢ãæµ®åå°æ°ç¹æŒç®ãåŠçããæ¹æ³ã«é¢é£ããŠããããã§ããA40ãV100ãGoogle Colabã® T4 GPUã§ã¯æ©èœããããšã確èªããŠããŸãããåãGPUãæã£ãŠããŠãåäœãä¿èšŒããããšã¯ã§ããŸãããCPUã䜿çšããŠãåãåé¡ãèµ·ãããŸããããã¯å
ã®StyleTTS2ã§æ·±å»ãªåé¡ã§ããããç§ãè¿œå ããããŸããŸãªãµã³ããªã³ã°æ¹æ³ã䜿ãããšã§ãå質ãžã®åœ±é¿ãæå°éã«æãã€ã€ããã£ãã¥ãŒãžã§ã³ãµã³ãã©ãŒãç¡å¹ã«ããããšãã§ããŸãã*** |
|
|
|
2. æäŸããããµã³ãã«ã®å質ãåçŸã§ããªãããŸãã¯äžè²«ããŠè¯ãçµæãåŸãããªã: |
|
|
|
***å¯å¶åŸ¡æ§ã«ã¯ä»£åãããããããã¯ãŠãŒã¶ããªãã£ã®äœäžã§ããç¹ã«ãæ¬è³ªçã«é決å®çãªã¢ãžã¥ãŒã«ã§æ§æããããããã¯ãŒã¯ã®å Žåã«åœãŠã¯ãŸããŸããã·ã¹ãã ã¯ã¹ã¿ã€ã«ãã¯ãã«ã®å€åã«éåžžã«ææã§ãããã ããæšè«ãã©ã¡ãŒã¿ãŒãæ
éã«èª¿æŽããè©Šè¡é¯èª€ããã°ãã»ãšãã©åžžã«æãå°è±¡çãªèªç¶ãªè¡šçŸãéæã§ãããšç¢ºä¿¡ããŠããŸãããŸããäžéšã®ã¹ããŒã«ãŒã¯ç¹å®ã®ææ
ãäžè²«ããŠåŠçã§ããªãå¯èœæ§ããããããå¥ã®ã¹ããŒã«ãŒããæ°ããææ
ãäœãåºãããšãã§ããŸããGradioã¹ããŒã¹ãæšè«ããŒãããã¯ã§ã®è©³ãã䜿çšæ¹æ³ã説æããŠããŸãã*** |
|
|
|
3. [RuntimeError: The size of tensor a (512) must match the size of tensor b (some number) at non-singleton dimension 3]: |
|
|
|
***å
¥åã1åã®æšè«ã«å¯ŸããŠé·ãããŸããLongformæšè«æ©èœã䜿çšããŠãã ãããããã¯ç¹ã«ãTsumugi(仮称)ãã§ãã¯ãã€ã³ãã§ã¯åé¡ã«ãªããŸããmLSTMã¬ã€ã€ãŒã®ã³ã³ããã¹ãé·ã512ã«å¶éãããŠãããããLongformæ©èœã䜿çšããªãéããçŽ10ç§ä»¥äžã®é³å£°ãçæã§ããŸããããã ããä»ã®ãã§ãã¯ãã€ã³ãã§ã¯ããã¯åé¡ã«ã¯ãªããŸãããLongform ã¢ã«ãŽãªãºã ã®ãããã§ãåºåã®é·ãã«çè«çãªå¶éã¯ãããŸããã*** |
|
|
|
3. çãå
¥åãå°è±¡çã§ã¯ãªã: |
|
|
|
***2ã§è¿°ã¹ãããšããã¹ãŠåœãŠã¯ãŸããŸããã¹ã¿ã€ã«ãã¯ãã«ãé©åãã©ããã確èªããŠãã ããããã ããäžè¬çã«éåžžã«çãå
¥åã®äœ¿çšã¯æšå¥šãããŸããã*** |
|
|
|
4. 2段éç®ã®èšç·Žã§NaNãçºç: |
|
|
|
***ã°ã©ãžãšã³ããççºããŠããã®ãããããŸãããã¯ãªããã³ã°ãè©Šãããããããµã€ãºã倧ãããå¯èœæ§ããããŸããããã§ã解決ããªãå Žåã¯ããªãªãžãã«ã®DPã¹ã¯ãªããã䜿ã£ãŠæåã®æ°ãšããã¯ãäºåèšç·Žããããšããå§ãããŸãããŸãã¯ãå®å
šã«DPã䜿çšããŠãã ããã*** |