xxx777xxxASD commited on
Commit
ce0aa77
1 Parent(s): f602dc0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md CHANGED
@@ -1,3 +1,89 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - merge
7
+ - moe
8
  ---
9
+ ![image/png](https://i.ibb.co/52Q93TK/icon.png)
10
+ (Maybe i`ll change icon image later)
11
+
12
+ Experimental MoE, the idea is to have more active parameters than 7xX model would have and keep it's size lower than 20B.
13
+
14
+ ### Base model (self merge)
15
+ ```
16
+ slices:
17
+ - sources:
18
+ - model: MistralInstruct-v0.2-128k
19
+ layer_range: [0, 24]
20
+ - sources:
21
+ - model: MistralInstruct-v0.2-128k
22
+ layer_range: [8, 24]
23
+ - sources:
24
+ - model: MistralInstruct-v0.2-128k
25
+ layer_range: [24, 32]
26
+ merge_method: passthrough
27
+ dtype: bfloat16
28
+ ```
29
+
30
+ ### First expert ("sandwich" merge)
31
+ ```
32
+ slices:
33
+ - sources:
34
+ - model: EroSumika-128k
35
+ layer_range: [0, 24]
36
+ - sources:
37
+ - model: Prima-Lelantacles-128k
38
+ layer_range: [8, 24]
39
+ - sources:
40
+ - model: EroSumika-128k
41
+ layer_range: [24, 32]
42
+ merge_method: passthrough
43
+ dtype: bfloat16
44
+
45
+ ```
46
+
47
+ ### Second expert ("sandwich" merge)
48
+ ```
49
+ slices:
50
+ - sources:
51
+ - model: AlphaMonarch-7B-128k
52
+ layer_range: [0, 24]
53
+ - sources:
54
+ - model: NeuralHuman-128k
55
+ layer_range: [8, 24]
56
+ - sources:
57
+ - model: AlphaMonarch-7B-128k
58
+ layer_range: [24, 32]
59
+ merge_method: passthrough
60
+ dtype: bfloat16
61
+ ```
62
+
63
+ Each 128k model is a slerp merge with [Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context](https://huggingface.co/Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context)
64
+
65
+ ### NeuralHuman recipe
66
+ ```
67
+ slices:
68
+ - sources:
69
+ - model: NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story
70
+ layer_range: [0, 24]
71
+ - sources:
72
+ - model: valine/MoreHuman
73
+ layer_range: [8, 24]
74
+ - sources:
75
+ - model: NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story
76
+ layer_range: [24, 32]
77
+ merge_method: passthrough
78
+ dtype: bfloat16
79
+ ```
80
+
81
+ ## Models used
82
+
83
+ - [localfultonextractor/Erosumika-7B](https://huggingface.co/localfultonextractor/Erosumika-7B)
84
+ - [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
85
+ - [Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context](https://huggingface.co/Epiculous/Fett-uccine-Long-Noodle-7B-120k-Context)
86
+ - [mlabonne/AlphaMonarch-7B](https://huggingface.co/mlabonne/AlphaMonarch-7B)
87
+ - [Nitral-AI/Prima-LelantaclesV6-7b](https://huggingface.co/Nitral-AI/Prima-LelantaclesV6-7b)
88
+ - [NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story](https://huggingface.co/NeuralNovel/Mistral-7B-Instruct-v0.2-Neural-Story7b)
89
+ - [valine/MoreHuman](https://huggingface.co/valine/MoreHuman)