Update1
Browse files
README.md
CHANGED
@@ -17,7 +17,74 @@ should probably proofread and complete it, then remove this comment. -->
|
|
17 |
This model is a fine-tuned version of [MBZUAI/speecht5_tts_clartts_ar](https://huggingface.co/MBZUAI/speecht5_tts_clartts_ar) on an unknown dataset.
|
18 |
It achieves the following results on the evaluation set:
|
19 |
- Loss: 0.3333
|
|
|
|
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
## Model description
|
22 |
|
23 |
More information needed
|
|
|
17 |
This model is a fine-tuned version of [MBZUAI/speecht5_tts_clartts_ar](https://huggingface.co/MBZUAI/speecht5_tts_clartts_ar) on an unknown dataset.
|
18 |
It achieves the following results on the evaluation set:
|
19 |
- Loss: 0.3333
|
20 |
+
# Uses
|
21 |
+
## ๐ค Transformers Usage
|
22 |
|
23 |
+
You can run ArTST TTS locally with the ๐ค Transformers library.
|
24 |
+
|
25 |
+
1. First install the ๐ค [Transformers library](https://github.com/huggingface/transformers), sentencepiece, soundfile and datasets(optional):
|
26 |
+
|
27 |
+
```
|
28 |
+
pip install --upgrade pip
|
29 |
+
pip install --upgrade transformers sentencepiece datasets[audio]
|
30 |
+
```
|
31 |
+
2. Run inference via the `Text-to-Speech` (TTS) pipeline. You can access the Arabic SPeechT5 model via the TTS pipeline in just a few lines of code!
|
32 |
+
|
33 |
+
```python
|
34 |
+
from transformers import pipeline
|
35 |
+
from datasets import load_dataset
|
36 |
+
import soundfile as sf
|
37 |
+
|
38 |
+
synthesiser = pipeline("text-to-speech", "("Messam174/speecht5_finetuned_essam2_ar")
|
39 |
+
|
40 |
+
embeddings_dataset = load_dataset("herwoww/arabic_xvector_embeddings", split="validation")
|
41 |
+
speaker_embedding = torch.tensor(embeddings_dataset[105]["speaker_embeddings"]).unsqueeze(0)
|
42 |
+
# You can replace this embedding with your own as well.
|
43 |
+
|
44 |
+
speech = synthesiser("ุงูุณูุงู
ุนูููู
ูุฑุญู
ุฉ ุงููู ูุจุฑูุงุชู ุญูุงูู
ุงููู ุฌู
ูุนุง", forward_params={"speaker_embeddings": speaker_embedding})
|
45 |
+
# ArTST is trained without diacritics.
|
46 |
+
|
47 |
+
sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])
|
48 |
+
```
|
49 |
+
3. Run inference via the Transformers modelling code - You can use the processor + generate code to convert text into a mono 16 kHz speech waveform for more fine-grained control.
|
50 |
+
|
51 |
+
```python
|
52 |
+
from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
|
53 |
+
from datasets import load_dataset
|
54 |
+
import torch
|
55 |
+
import soundfile as sf
|
56 |
+
from pydub import AudioSegment
|
57 |
+
|
58 |
+
# Check if GPU is available
|
59 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
60 |
+
print(f"Using device: {device}")
|
61 |
+
|
62 |
+
# Load processor, model, and vocoder
|
63 |
+
processor = SpeechT5Processor.from_pretrained("Messam174/speecht5_finetuned_essam2_ar")
|
64 |
+
model = SpeechT5ForTextToSpeech.from_pretrained("Messam174/speecht5_finetuned_essam2_ar").to(device)
|
65 |
+
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan").to(device)
|
66 |
+
|
67 |
+
# Prepare inputs
|
68 |
+
inputs = processor(
|
69 |
+
text="ุงูุณูุงู
ุนูููู
ูุฑุญู
ุฉ ุงููู ูุจุฑูุงุชู ุญูุงูู
ุงููู ุฌู
ูุนุง", return_tensors="pt"
|
70 |
+
).to(device)
|
71 |
+
|
72 |
+
# Load xvector containing speaker's voice characteristics from a dataset
|
73 |
+
embeddings_dataset = load_dataset("herwoww/arabic_xvector_embeddings", split="validation")
|
74 |
+
speaker_embedding = torch.tensor(embeddings_dataset[105]["speaker_embeddings"]).unsqueeze(0).to(device)
|
75 |
+
|
76 |
+
# Generate speech
|
77 |
+
with torch.no_grad(): # Disable gradient computation for inference
|
78 |
+
speech = model.generate_speech(inputs["input_ids"], speaker_embedding, vocoder=vocoder)
|
79 |
+
|
80 |
+
# Save the output as WAV
|
81 |
+
wav_file = "speech.wav"
|
82 |
+
sf.write(wav_file, speech.cpu().numpy(), samplerate=16000)
|
83 |
+
print(f"Speech saved to '{wav_file}'")
|
84 |
+
|
85 |
+
|
86 |
+
|
87 |
+
```
|
88 |
## Model description
|
89 |
|
90 |
More information needed
|