Text-to-Speech
Safetensors
English
model_hub_mixin
pytorch_model_hub_mixin
Kang Min Yoo commited on
Commit
96aa943
1 Parent(s): 964c0bd

Initial commit

Browse files
Files changed (4) hide show
  1. README.md +37 -0
  2. config.json +17 -0
  3. gitattributes +35 -0
  4. model.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - model_hub_mixin
4
+ - pytorch_model_hub_mixin
5
+ license: apache-2.0
6
+ datasets:
7
+ - speechcolab/gigaspeech
8
+ - facebook/multilingual_librispeech
9
+ language:
10
+ - en
11
+ pipeline_tag: text-to-speech
12
+ ---
13
+
14
+ # Model Card
15
+
16
+ <!-- Provide a quick summary of what the model is/does. -->
17
+ Token-Voicebox, a model following the Voicebox architecture, reconstructs speech from speech tokens generated by USDM.
18
+
19
+ ## Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation [NeurIPS 2024]
20
+
21
+ - **Repository:** https://github.com/naver-ai/usdm
22
+ - **Paper:** https://openreview.net/forum?id=NjewXJUDYq
23
+ - **Project Page:** https://unifiedsdm.github.io/
24
+
25
+
26
+ ## BibTeX
27
+
28
+ ```
29
+ @inproceedings{
30
+ kim2024paralinguisticsaware,
31
+ title={Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation},
32
+ author={Heeseung Kim and Soonshin Seo and Kyeongseok Jeong and Ohsung Kwon and Soyoon Kim and Jungwhan Kim and Jaehong Lee and Eunwoo Song and Myungwoo Oh and Jung-Woo Ha and Sungroh Yoon and Kang Min Yoo},
33
+ booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
34
+ year={2024},
35
+ url={https://openreview.net/forum?id=NjewXJUDYq}
36
+ }
37
+ ```
config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "activation_dropout": 0.1,
3
+ "attention_dropout": 0.0,
4
+ "convpos_depth": 2,
5
+ "convpos_groups": 16,
6
+ "convpos_width": 31,
7
+ "embedding_dim": 1280,
8
+ "hidden_dropout": 0.0,
9
+ "hidden_size": 1024,
10
+ "intermediate_size": 4096,
11
+ "n_feats": 80,
12
+ "n_tokens": 10000,
13
+ "num_attention_heads": 16,
14
+ "num_hidden_layers": 24,
15
+ "sigma_min": 0.0001,
16
+ "solver": "euler"
17
+ }
gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f3935e08887c7ee715a78db3525b8f82fce46e8236c0b1a19598dc6f0dfee4d
3
+ size 1383704304