Update README.md

2447dfa verified 9 months ago

4.39 kB

	---
	license: mit
	language:
	- ar
	pipeline_tag: text-to-speech
	---

	# ArTST
	SpeechT5 for Arabic (TTS task)

	Here we use the pretained weights from ArTST and fine-tuned using huggingface implementation of SpeechT5 on Classical Arabic ClArTTS for speech synthesis (text-to-speech).

	ArTST was first released in [this repository](https://github.com/mbzuai-nlp/ArTST ), [pretrained weights](https://huggingface.co/MBZUAI/ArTST/blob/main/pretrain_checkpoint.pt).

	# Uses
	## 🤗 Transformers Usage

	You can run ArTST TTS locally with the 🤗 Transformers library.

	1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers), sentencepiece, soundfile and datasets(optional):

	```
	pip install --upgrade pip
	pip install --upgrade transformers sentencepiece datasets[audio]
	```
	2. Run inference via the `Text-to-Speech` (TTS) pipeline. You can access the Arabic SPeechT5 model via the TTS pipeline in just a few lines of code!

	```python
	from transformers import pipeline
	from datasets import load_dataset
	import soundfile as sf

	synthesiser = pipeline("text-to-speech", "MBZUAI/speecht5_tts_clartts_ar")

	embeddings_dataset = load_dataset("herwoww/arabic_xvector_embeddings", split="validation")
	speaker_embedding = torch.tensor(embeddings_dataset[105]["speaker_embeddings"]).unsqueeze(0)
	# You can replace this embedding with your own as well.

	speech = synthesiser("لأنه لا يرى أنه على السفه ثم من بعد ذلك حديث منتشر", forward_params={"speaker_embeddings": speaker_embedding})
	# ArTST is trained without diacritics.

	sf.write("speech.wav", speech["audio"], samplerate=speech["sampling_rate"])
	```
	3. Run inference via the Transformers modelling code - You can use the processor + generate code to convert text into a mono 16 kHz speech waveform for more fine-grained control.

	```python
	from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
	from datasets import load_dataset
	import torch
	import soundfile as sf
	from datasets import load_dataset

	processor = SpeechT5Processor.from_pretrained("MBZUAI/speecht5_tts_clartts_ar")
	model = SpeechT5ForTextToSpeech.from_pretrained("MBZUAI/speecht5_tts_clartts_ar")
	vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

	inputs = processor(text="لأنه لا يرى أنه على السفه ثم من بعد ذلك حديث منتشر", return_tensors="pt")

	# load xvector containing speaker's voice characteristics from a dataset
	embeddings_dataset = load_dataset("herwoww/arabic_xvector_embeddings", split="validation")
	speaker_embedding = torch.tensor(embeddings_dataset[105]["speaker_embeddings"]).unsqueeze(0)

	speech = model.generate_speech(inputs["input_ids"], speaker_embedding, vocoder=vocoder)

	sf.write("speech.wav", speech.numpy(), samplerate=16000)
	```


	# Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	BibTeX:

	```bibtex
	@inproceedings{toyin-etal-2023-artst,
	title = "{A}r{TST}: {A}rabic Text and Speech Transformer",
	author = "Toyin, Hawau and
	Djanibekov, Amirbek and
	Kulkarni, Ajinkya and
	Aldarmaki, Hanan",
	editor = "Sawaf, Hassan and
	El-Beltagy, Samhaa and
	Zaghouani, Wajdi and
	Magdy, Walid and
	Abdelali, Ahmed and
	Tomeh, Nadi and
	Abu Farha, Ibrahim and
	Habash, Nizar and
	Khalifa, Salam and
	Keleg, Amr and
	Haddad, Hatem and
	Zitouni, Imed and
	Mrini, Khalil and
	Almatham, Rawan",
	booktitle = "Proceedings of ArabicNLP 2023",
	month = dec,
	year = "2023",
	address = "Singapore (Hybrid)",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2023.arabicnlp-1.5",
	pages = "41--51"
	}
	@inproceedings{ao-etal-2022-speecht5,
	title = {{S}peech{T}5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing},
	author = {Ao, Junyi and Wang, Rui and Zhou, Long and Wang, Chengyi and Ren, Shuo and Wu, Yu and Liu, Shujie and Ko, Tom and Li, Qing and Zhang, Yu and Wei, Zhihua and Qian, Yao and Li, Jinyu and Wei, Furu},
	booktitle = {Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
	month = {May},
	year = {2022},
	pages={5723--5738},
	}
	```