Messam174 commited on
Commit
d742052
โ€ข
1 Parent(s): 752880a
Files changed (1) hide show
  1. README.md +67 -0
README.md CHANGED
@@ -17,7 +17,74 @@ should probably proofread and complete it, then remove this comment. -->
17
  This model is a fine-tuned version of [MBZUAI/speecht5_tts_clartts_ar](https://huggingface.co/MBZUAI/speecht5_tts_clartts_ar) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
  - Loss: 0.3333
 
 
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## Model description
22
 
23
  More information needed
 
17
  This model is a fine-tuned version of [MBZUAI/speecht5_tts_clartts_ar](https://huggingface.co/MBZUAI/speecht5_tts_clartts_ar) on an unknown dataset.
18
  It achieves the following results on the evaluation set:
19
  - Loss: 0.3333
20
+ # Uses
21
+ ## ๐Ÿค— Transformers Usage
22
 
23
+ You can run ArTST TTS locally with the ๐Ÿค— Transformers library.
24
+
25
+ 1. First install the ๐Ÿค— [Transformers library](https://github.com/huggingface/transformers), sentencepiece, soundfile and datasets(optional):
26
+
27
+ ```
28
+ pip install --upgrade pip
29
+ pip install --upgrade transformers sentencepiece datasets[audio]
30
+ ```
31
+ 2. Run inference via the `Text-to-Speech` (TTS) pipeline. You can access the Arabic SPeechT5 model via the TTS pipeline in just a few lines of code!
32
+
33
+ ```python
34
+ from transformers import pipeline
35
+ from datasets import load_dataset
36
+ import soundfile as sf
37
+
38
+ synthesiser = pipeline("text-to-speech", "("Messam174/speecht5_finetuned_essam2_ar")
39
+
40
+ embeddings_dataset = load_dataset("herwoww/arabic_xvector_embeddings", split="validation")
41
+ speaker_embedding = torch.tensor(embeddings_dataset[105]["speaker_embeddings"]).unsqueeze(0)
42
+ # You can replace this embedding with your own as well.
43
+
44
+ speech = synthesiser("ุงู„ุณู„ุงู… ุนู„ูŠูƒู… ูˆุฑุญู…ุฉ ุงู„ู„ู‡ ูˆุจุฑูƒุงุชู‡ ุญูŠุงูƒู… ุงู„ู„ู‡ ุฌู…ูŠุนุง", forward_params={"speaker_embeddings": speaker_embedding})
45
+ # ArTST is trained without diacritics.
46
+
47
+ sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])
48
+ ```
49
+ 3. Run inference via the Transformers modelling code - You can use the processor + generate code to convert text into a mono 16 kHz speech waveform for more fine-grained control.
50
+
51
+ ```python
52
+ from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
53
+ from datasets import load_dataset
54
+ import torch
55
+ import soundfile as sf
56
+ from pydub import AudioSegment
57
+
58
+ # Check if GPU is available
59
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
60
+ print(f"Using device: {device}")
61
+
62
+ # Load processor, model, and vocoder
63
+ processor = SpeechT5Processor.from_pretrained("Messam174/speecht5_finetuned_essam2_ar")
64
+ model = SpeechT5ForTextToSpeech.from_pretrained("Messam174/speecht5_finetuned_essam2_ar").to(device)
65
+ vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan").to(device)
66
+
67
+ # Prepare inputs
68
+ inputs = processor(
69
+ text="ุงู„ุณู„ุงู… ุนู„ูŠูƒู… ูˆุฑุญู…ุฉ ุงู„ู„ู‡ ูˆุจุฑูƒุงุชู‡ ุญูŠุงูƒู… ุงู„ู„ู‡ ุฌู…ูŠุนุง", return_tensors="pt"
70
+ ).to(device)
71
+
72
+ # Load xvector containing speaker's voice characteristics from a dataset
73
+ embeddings_dataset = load_dataset("herwoww/arabic_xvector_embeddings", split="validation")
74
+ speaker_embedding = torch.tensor(embeddings_dataset[105]["speaker_embeddings"]).unsqueeze(0).to(device)
75
+
76
+ # Generate speech
77
+ with torch.no_grad(): # Disable gradient computation for inference
78
+ speech = model.generate_speech(inputs["input_ids"], speaker_embedding, vocoder=vocoder)
79
+
80
+ # Save the output as WAV
81
+ wav_file = "speech.wav"
82
+ sf.write(wav_file, speech.cpu().numpy(), samplerate=16000)
83
+ print(f"Speech saved to '{wav_file}'")
84
+
85
+
86
+
87
+ ```
88
  ## Model description
89
 
90
  More information needed