xxx777xxxASD
/

PrimaMonarch-EroSumika-2x10.7B-128k

Text Generation

Mixture of Experts

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

xxx777xxxASD commited on Mar 16

Commit

ce0aa77

•

1 Parent(s): f602dc0

Update README.md

Files changed (1) hide show

README.md +86 -0

README.md CHANGED Viewed

@@ -1,3 +1,89 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+- en
+tags:
+- merge
+- moe
 ---
+![image/png](https://i.ibb.co/52Q93TK/icon.png)
+(Maybe i`ll change icon image later)
+Experimental MoE, the idea is to have more active parameters than 7xX model would have and keep it's size lower than 20B.
+### Base model (self merge)
+```
+slices:
+  - sources:
+    - model: MistralInstruct-v0.2-128k
+      layer_range: [0, 24]
+  - sources:
+    - model: MistralInstruct-v0.2-128k
+      layer_range: [8, 24]
+  - sources:
+    - model: MistralInstruct-v0.2-128k
+      layer_range: [24, 32]
+merge_method: passthrough
+dtype: bfloat16
+```
+### First expert ("sandwich" merge)
+```
+slices:
+  - sources:
+    - model: EroSumika-128k
+      layer_range: [0, 24]
+  - sources:
+    - model: Prima-Lelantacles-128k
+      layer_range: [8, 24]
+  - sources:
+    - model: EroSumika-128k
+      layer_range: [24, 32]
+merge_method: passthrough
+dtype: bfloat16
+```
+### Second expert ("sandwich" merge)
+```
+slices:
+  - sources:
+    - model: AlphaMonarch-7B-128k
+      layer_range: [0, 24]
+  - sources:
+    - model: NeuralHuman-128k
+      layer_range: [8, 24]
+  - sources:
+    - model: AlphaMonarch-7B-128k
+      layer_range: [24, 32]
+merge_method: passthrough
+dtype: bfloat16
+```
+Each 128k model is a slerp merge with [Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context](https://huggingface.co/Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context)
+### NeuralHuman recipe
+```
+slices:
+  - sources:
+    - model: NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story
+      layer_range: [0, 24]
+  - sources:
+    - model: valine/MoreHuman
+      layer_range: [8, 24]
+  - sources:
+    - model: NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story
+      layer_range: [24, 32]
+merge_method: passthrough
+dtype: bfloat16
+```
+## Models used
+- [localfultonextractor/Erosumika-7B](https://huggingface.co/localfultonextractor/Erosumika-7B)
+- [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
+- [Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context](https://huggingface.co/Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context)
+- [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B)
+- [Nitral-AI/Prima-LelantaclesV6-7b](https://huggingface.co/Nitral-AI/Prima-LelantaclesV6-7b)
+- [NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story](https://huggingface.co/NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story7b)
+- [valine/MoreHuman](https://huggingface.co/valine/MoreHuman)