bigband commited on
Commit
246b89c
·
verified ·
1 Parent(s): c818ed5

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +95 -0
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - text-to-speech
5
+ - annotation
6
+ language:
7
+ - en
8
+ pipeline_tag: text-to-speech
9
+ inference: false
10
+ datasets:
11
+ - ylacombe/jenny-tts-tagged-v1
12
+ - reach-vb/jenny_tts_dataset
13
+ ---
14
+
15
+
16
+
17
+ <img src="https://huggingface.co/datasets/parler-tts/images/resolve/main/thumbnail.png" alt="Parler Logo" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
18
+
19
+
20
+ # Parler-TTS Mini v1 - Jenny
21
+
22
+ <a target="_blank" href="https://huggingface.co/spaces/parler-tts/parler_tts_mini">
23
+ <img src="https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg" alt="Open in HuggingFace"/>
24
+ </a>
25
+
26
+
27
+ * **Fine-tuning guide on Colab:**
28
+
29
+ <a target="_blank" href="https://github.com/ylacombe/scripts_and_notebooks/blob/main/Finetuning_Parler_TTS_v1_on_a_single_speaker_dataset.ipynb">
30
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
31
+ </a>
32
+
33
+ Fine-tuned version of **Parler-TTS Mini v1** on the [30-hours single-speaker high-quality Jenny (she's Irish ☘️) dataset](https://github.com/dioco-group/jenny-tts-dataset), suitable for training a TTS model.
34
+ Usage is more or less the same as Parler-TTS v1, just specify they keyword “Jenny” in the voice description:
35
+
36
+ ## Usage
37
+
38
+
39
+ ```sh
40
+ pip install git+https://github.com/huggingface/parler-tts.git
41
+ ```
42
+
43
+ You can then use the model with the following inference snippet:
44
+
45
+ ```py
46
+ import torch
47
+ from parler_tts import ParlerTTSForConditionalGeneration
48
+ from transformers import AutoTokenizer
49
+ import soundfile as sf
50
+
51
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
52
+
53
+ model = ParlerTTSForConditionalGeneration.from_pretrained("parler-tts/parler-mini-v1-jenny").to(device)
54
+ tokenizer = AutoTokenizer.from_pretrained("parler-tts/parler-mini-v1-jenny")
55
+
56
+ prompt = "Hey, how are you doing today? My name is Jenny, and I'm here to help you with any questions you have."
57
+ description = "Jenny speaks at an average pace with an animated delivery in a very confined sounding environment with clear audio quality."
58
+
59
+ input_ids = tokenizer(description, return_tensors="pt").input_ids.to(device)
60
+ prompt_input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
61
+
62
+ generation = model.generate(input_ids=input_ids, prompt_input_ids=prompt_input_ids)
63
+ audio_arr = generation.cpu().numpy().squeeze()
64
+ sf.write("parler_tts_out.wav", audio_arr, model.config.sampling_rate)
65
+ ```
66
+
67
+ ## Citation
68
+
69
+ If you found this repository useful, please consider citing this work and also the original Stability AI paper:
70
+
71
+ ```
72
+ @misc{lacombe-etal-2024-parler-tts,
73
+ author = {Yoach Lacombe and Vaibhav Srivastav and Sanchit Gandhi},
74
+ title = {Parler-TTS},
75
+ year = {2024},
76
+ publisher = {GitHub},
77
+ journal = {GitHub repository},
78
+ howpublished = {\url{https://github.com/huggingface/parler-tts}}
79
+ }
80
+ ```
81
+
82
+ ```
83
+ @misc{lyth2024natural,
84
+ title={Natural language guidance of high-fidelity text-to-speech with synthetic annotations},
85
+ author={Dan Lyth and Simon King},
86
+ year={2024},
87
+ eprint={2402.01912},
88
+ archivePrefix={arXiv},
89
+ primaryClass={cs.SD}
90
+ }
91
+ ```
92
+
93
+ ## License
94
+
95
+ License - Attribution is required in software/websites/projects/interfaces (including voice interfaces) that generate audio in response to user action using this dataset. Atribution means: the voice must be referred to as "Jenny", and where at all practical, "Jenny (Dioco)". Attribution is not required when distributing the generated clips (although welcome). Commercial use is permitted. Don't do unfair things like claim the dataset is your own. No further restrictions apply.