Respair commited on
Commit
ff72b77
1 Parent(s): 203c725

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -69,7 +69,13 @@ There's also a few things that's related to Japanese. such as how we can improve
69
  ## How to do ...
70
 
71
  # Inference:
72
- check the inference notebook. before that, make sure you read the **Important Notes** section down below.
 
 
 
 
 
 
73
 
74
  # Training:
75
 
@@ -98,7 +104,7 @@ Third stage training (Kotodama, prompt encoding, etc.):
98
 
99
  I can think of a few things that can be improved, not nessarily by me, treat it as some sorts of suggestions:
100
 
101
- - [o] changing the decoder (fregrad looks promising)
102
  - [o] retraining the Pitch Extractor using a different algorithm
103
  - [o] while the quality of non-speech sounds have been improved, it cannot generate an entirely non-speech output, perhaps because of the hard alignement.
104
  - [o] using the Style encoder as another modality in LLMs, since they have a detailed representation of the tone and expression of a speech (similar to Style-Talker).
 
69
  ## How to do ...
70
 
71
  # Inference:
72
+
73
+ Gradio demo:
74
+ ```bash
75
+ python app_tsumugi.py
76
+ ```
77
+
78
+ or check the inference notebook. before that, make sure you read the **Important Notes** section down below.
79
 
80
  # Training:
81
 
 
104
 
105
  I can think of a few things that can be improved, not nessarily by me, treat it as some sorts of suggestions:
106
 
107
+ - [o] changing the decoder ([fregrad](https://github.com/kaistmm/fregrad) looks promising)
108
  - [o] retraining the Pitch Extractor using a different algorithm
109
  - [o] while the quality of non-speech sounds have been improved, it cannot generate an entirely non-speech output, perhaps because of the hard alignement.
110
  - [o] using the Style encoder as another modality in LLMs, since they have a detailed representation of the tone and expression of a speech (similar to Style-Talker).