Update README.md
Browse files
README.md
CHANGED
@@ -69,7 +69,13 @@ There's also a few things that's related to Japanese. such as how we can improve
|
|
69 |
## How to do ...
|
70 |
|
71 |
# Inference:
|
72 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
73 |
|
74 |
# Training:
|
75 |
|
@@ -98,7 +104,7 @@ Third stage training (Kotodama, prompt encoding, etc.):
|
|
98 |
|
99 |
I can think of a few things that can be improved, not nessarily by me, treat it as some sorts of suggestions:
|
100 |
|
101 |
-
- [o] changing the decoder (fregrad looks promising)
|
102 |
- [o] retraining the Pitch Extractor using a different algorithm
|
103 |
- [o] while the quality of non-speech sounds have been improved, it cannot generate an entirely non-speech output, perhaps because of the hard alignement.
|
104 |
- [o] using the Style encoder as another modality in LLMs, since they have a detailed representation of the tone and expression of a speech (similar to Style-Talker).
|
|
|
69 |
## How to do ...
|
70 |
|
71 |
# Inference:
|
72 |
+
|
73 |
+
Gradio demo:
|
74 |
+
```bash
|
75 |
+
python app_tsumugi.py
|
76 |
+
```
|
77 |
+
|
78 |
+
or check the inference notebook. before that, make sure you read the **Important Notes** section down below.
|
79 |
|
80 |
# Training:
|
81 |
|
|
|
104 |
|
105 |
I can think of a few things that can be improved, not nessarily by me, treat it as some sorts of suggestions:
|
106 |
|
107 |
+
- [o] changing the decoder ([fregrad](https://github.com/kaistmm/fregrad) looks promising)
|
108 |
- [o] retraining the Pitch Extractor using a different algorithm
|
109 |
- [o] while the quality of non-speech sounds have been improved, it cannot generate an entirely non-speech output, perhaps because of the hard alignement.
|
110 |
- [o] using the Style encoder as another modality in LLMs, since they have a detailed representation of the tone and expression of a speech (similar to Style-Talker).
|