GreenCounsel
/

speecht5_tts_common_voice_5_sv

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

CEHB commited on Jun 23, 2023

Commit

468c094

•

1 Parent(s): fea56be

Update README.md

Files changed (1) hide show

README.md +4 -3

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ It achieves the following results on the evaluation set:
 Swedish SpeechT5 model trained on Swedish language in Common Voice. Example on how to implement the model below (not possible to run inference at Huggingface).
 ```
-pip install datasets soundfile speechbrain
 pip install git+https://github.com/huggingface/transformers.git
 from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan, set_seed
@@ -53,6 +53,7 @@ repl = [
 ]
 embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
 speaker_embeddings = torch.tensor(embeddings_dataset[7000]["xvector"]).unsqueeze(0)
@@ -65,8 +66,8 @@ inputs = processor(text=text, return_tensors="pt")
 speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
-from IPython.display import Audio
-Audio(speech.cpu().numpy(), rate=16000)
 ```

 Swedish SpeechT5 model trained on Swedish language in Common Voice. Example on how to implement the model below (not possible to run inference at Huggingface).
 ```
+pip install datasets soundfile
 pip install git+https://github.com/huggingface/transformers.git
 from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan, set_seed
 ]
+from datasets import load_dataset
 embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
 speaker_embeddings = torch.tensor(embeddings_dataset[7000]["xvector"]).unsqueeze(0)
 speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
+import soundfile as sf
+sf.write("output.wav", speech.numpy(), samplerate=16000)
 ```