jetmoe
/

jetmoe-8b

YikangS commited on Apr 4

Commit

e780836

•

1 Parent(s): f66a060

update config and readme

Files changed (2) hide show

README.md CHANGED Viewed

@@ -79,7 +79,6 @@ AutoModelForSequenceClassification.register(JetMoEConfig, JetMoEForSequenceClass
 tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b')
 model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')
 ```
-The MoE code is based on the [ScatterMoE](https://github.com/shawntan/scattermoe). The code is still under active development, we are happy to receive any feedback or suggestions.
 ## Model Details
 JetMoE-8B has 24 blocks.
@@ -111,7 +110,9 @@ For more details, please refer to the JetMoE Technical Report (Coming Soon).
 ## JetMoE Model Index
 |Model|Index|
 |---|---|
-|JetMoE-8B| [Link](https://huggingface.co/jetmoe/jetmoe-8B) |
 ## Acknowledgement
 We express our gratitude to [Shengding Hu](https://shengdinghu.github.io/) for his valuable advice on the Phase 2 data mixture. We also express our gratitude to [Exabits](https://www.exabits.ai/) for their assistance in setting up the GPU clusters, and to [Lepton AI](https://www.lepton.ai/) for their support in setting up the chat demo.

 tokenizer = AutoTokenizer.from_pretrained('jetmoe/jetmoe-8b')
 model = AutoModelForCausalLM.from_pretrained('jetmoe/jetmoe-8b')
 ```
 ## Model Details
 JetMoE-8B has 24 blocks.
 ## JetMoE Model Index
 |Model|Index|
 |---|---|
+|JetMoE-8B-Base| [Link](https://huggingface.co/jetmoe/jetmoe-8B) |
+|JetMoE-8B-SFT| [Link](https://huggingface.co/jetmoe/jetmoe-8B-sft) |
+|JetMoE-8B-Chat| [Link](https://huggingface.co/jetmoe/jetmoe-8B-chat) |
 ## Acknowledgement
 We express our gratitude to [Shengding Hu](https://shengdinghu.github.io/) for his valuable advice on the Phase 2 data mixture. We also express our gratitude to [Exabits](https://www.exabits.ai/) for their assistance in setting up the GPU clusters, and to [Lepton AI](https://www.lepton.ai/) for their support in setting up the chat demo.

config.json CHANGED Viewed

@@ -12,10 +12,10 @@
   "length_penalty": 1.0,
   "moe_num_experts": 8,
   "moe_top_k": 2,
-  "n_embd": 2048,
-  "n_layer": 24,
   "n_positions": 4096,
-  "n_head": 16,
   "num_key_value_heads": 16,
   "num_layers": 24,
   "rms_norm_eps": 1e-05,

   "length_penalty": 1.0,
   "moe_num_experts": 8,
   "moe_top_k": 2,
+  "hidden_size": 2048,
+  "num_hidden_layers": 24,
   "n_positions": 4096,
+  "num_attention_heads": 32,
   "num_key_value_heads": 16,
   "num_layers": 24,
   "rms_norm_eps": 1e-05,