update config and readme
Browse files- README.md +3 -2
- config.json +3 -3
README.md
CHANGED
@@ -79,7 +79,6 @@ AutoModelForSequenceClassification.register(JetMoEConfig, JetMoEForSequenceClass
|
|
79 |
tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b')
|
80 |
model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')
|
81 |
```
|
82 |
-
The MoE code is based on the [ScatterMoE](https://github.com/shawntan/scattermoe). The code is still under active development, we are happy to receive any feedback or suggestions.
|
83 |
|
84 |
## Model Details
|
85 |
JetMoE-8B has 24 blocks.
|
@@ -111,7 +110,9 @@ For more details, please refer to the JetMoE Technical Report (Coming Soon).
|
|
111 |
## JetMoE Model Index
|
112 |
|Model|Index|
|
113 |
|---|---|
|
114 |
-
|JetMoE-8B| [Link](https://huggingface.co/jetmoe/jetmoe-8B) |
|
|
|
|
|
115 |
|
116 |
## Acknowledgement
|
117 |
We express our gratitude to [Shengding Hu](https://shengdinghu.github.io/) for his valuable advice on the Phase 2 data mixture. We also express our gratitude to [Exabits](https://www.exabits.ai/) for their assistance in setting up the GPU clusters, and to [Lepton AI](https://www.lepton.ai/) for their support in setting up the chat demo.
|
|
|
79 |
tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b')
|
80 |
model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')
|
81 |
```
|
|
|
82 |
|
83 |
## Model Details
|
84 |
JetMoE-8B has 24 blocks.
|
|
|
110 |
## JetMoE Model Index
|
111 |
|Model|Index|
|
112 |
|---|---|
|
113 |
+
|JetMoE-8B-Base| [Link](https://huggingface.co/jetmoe/jetmoe-8B) |
|
114 |
+
|JetMoE-8B-SFT| [Link](https://huggingface.co/jetmoe/jetmoe-8B-sft) |
|
115 |
+
|JetMoE-8B-Chat| [Link](https://huggingface.co/jetmoe/jetmoe-8B-chat) |
|
116 |
|
117 |
## Acknowledgement
|
118 |
We express our gratitude to [Shengding Hu](https://shengdinghu.github.io/) for his valuable advice on the Phase 2 data mixture. We also express our gratitude to [Exabits](https://www.exabits.ai/) for their assistance in setting up the GPU clusters, and to [Lepton AI](https://www.lepton.ai/) for their support in setting up the chat demo.
|
config.json
CHANGED
@@ -12,10 +12,10 @@
|
|
12 |
"length_penalty": 1.0,
|
13 |
"moe_num_experts": 8,
|
14 |
"moe_top_k": 2,
|
15 |
-
"
|
16 |
-
"
|
17 |
"n_positions": 4096,
|
18 |
-
"
|
19 |
"num_key_value_heads": 16,
|
20 |
"num_layers": 24,
|
21 |
"rms_norm_eps": 1e-05,
|
|
|
12 |
"length_penalty": 1.0,
|
13 |
"moe_num_experts": 8,
|
14 |
"moe_top_k": 2,
|
15 |
+
"hidden_size": 2048,
|
16 |
+
"num_hidden_layers": 24,
|
17 |
"n_positions": 4096,
|
18 |
+
"num_attention_heads": 32,
|
19 |
"num_key_value_heads": 16,
|
20 |
"num_layers": 24,
|
21 |
"rms_norm_eps": 1e-05,
|