llmixer commited on
Commit
dbf6b9d
1 Parent(s): 4bdd94c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +187 -0
README.md ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - meta-llama/Meta-Llama-3-70B-Instruct
4
+ license: llama3
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - merge
10
+ - frankenmerge
11
+ - 96b
12
+ ---
13
+ # BigWeave v29 96b
14
+
15
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65a6db055c58475cf9e6def1/4CbbAN-X7ZWj702JrcCGH.png" width=600>
16
+
17
+ The BigWeave models aim to experimentally identify merge settings for increasing model performance. The version number merely tracks various attempts and is not a quality indicator. Only results demonstrating good performance are retained and shared.
18
+
19
+ # Prompting Format
20
+ llamav3
21
+
22
+ # Merge process
23
+ This is a self-merge of meta-llama/Meta-Llama-3-70B-Instruct. Middle layers are duplicated and various matrices are scaled according to the template by jukofyork as shown here: https://github.com/arcee-ai/mergekit/issues/198#issuecomment-2079950009
24
+
25
+ Merge configuration:
26
+ ```
27
+ const_tag: &MODEL meta-llama/Meta-Llama-3-70B-Instruct
28
+
29
+ const_tag: &RESIDUAL_SCALE_FACTOR 0.5
30
+ const_tag: &QK_ATTENUATION_FACTOR 0.7071067812
31
+ const_tag: &OUT_FACTOR 0.9
32
+
33
+ scale-filter-env: &scale_filter_env
34
+ parameters:
35
+ scale:
36
+ - filter: o_proj
37
+ value: *RESIDUAL_SCALE_FACTOR
38
+ - filter: down_proj
39
+ value: *RESIDUAL_SCALE_FACTOR
40
+ - filter: q_proj
41
+ value: *QK_ATTENUATION_FACTOR
42
+ - filter: k_proj
43
+ value: *QK_ATTENUATION_FACTOR
44
+ - filter: v_proj
45
+ value: *OUT_FACTOR
46
+ - filter: up_proj
47
+ value: *OUT_FACTOR
48
+ - value: 1.0
49
+
50
+ slices:
51
+ - sources:
52
+ - model: *MODEL
53
+ layer_range: [0, 25]
54
+
55
+ - sources:
56
+ - model: *MODEL
57
+ layer_range: [25, 26]
58
+ <<: *scale_filter_env
59
+
60
+ - sources:
61
+ - model: *MODEL
62
+ layer_range: [25, 27]
63
+ <<: *scale_filter_env
64
+ - sources:
65
+ - model: *MODEL
66
+ layer_range: [26, 28]
67
+ <<: *scale_filter_env
68
+ - sources:
69
+ - model: *MODEL
70
+ layer_range: [27, 29]
71
+ <<: *scale_filter_env
72
+ - sources:
73
+ - model: *MODEL
74
+ layer_range: [28, 30]
75
+ <<: *scale_filter_env
76
+ - sources:
77
+ - model: *MODEL
78
+ layer_range: [29, 31]
79
+ <<: *scale_filter_env
80
+ - sources:
81
+ - model: *MODEL
82
+ layer_range: [30, 32]
83
+ <<: *scale_filter_env
84
+ - sources:
85
+ - model: *MODEL
86
+ layer_range: [31, 33]
87
+ <<: *scale_filter_env
88
+ - sources:
89
+ - model: *MODEL
90
+ layer_range: [32, 34]
91
+ <<: *scale_filter_env
92
+ - sources:
93
+ - model: *MODEL
94
+ layer_range: [33, 35]
95
+ <<: *scale_filter_env
96
+ - sources:
97
+ - model: *MODEL
98
+ layer_range: [34, 36]
99
+ <<: *scale_filter_env
100
+ - sources:
101
+ - model: *MODEL
102
+ layer_range: [35, 37]
103
+ <<: *scale_filter_env
104
+ - sources:
105
+ - model: *MODEL
106
+ layer_range: [36, 38]
107
+ <<: *scale_filter_env
108
+ - sources:
109
+ - model: *MODEL
110
+ layer_range: [37, 39]
111
+ <<: *scale_filter_env
112
+ - sources:
113
+ - model: *MODEL
114
+ layer_range: [38, 40]
115
+ <<: *scale_filter_env
116
+ - sources:
117
+ - model: *MODEL
118
+ layer_range: [39, 41]
119
+ <<: *scale_filter_env
120
+ - sources:
121
+ - model: *MODEL
122
+ layer_range: [40, 42]
123
+ <<: *scale_filter_env
124
+ - sources:
125
+ - model: *MODEL
126
+ layer_range: [41, 43]
127
+ <<: *scale_filter_env
128
+ - sources:
129
+ - model: *MODEL
130
+ layer_range: [42, 44]
131
+ <<: *scale_filter_env
132
+ - sources:
133
+ - model: *MODEL
134
+ layer_range: [43, 45]
135
+ <<: *scale_filter_env
136
+ - sources:
137
+ - model: *MODEL
138
+ layer_range: [44, 46]
139
+ <<: *scale_filter_env
140
+ - sources:
141
+ - model: *MODEL
142
+ layer_range: [45, 47]
143
+ <<: *scale_filter_env
144
+ - sources:
145
+ - model: *MODEL
146
+ layer_range: [46, 48]
147
+ <<: *scale_filter_env
148
+ - sources:
149
+ - model: *MODEL
150
+ layer_range: [47, 49]
151
+ <<: *scale_filter_env
152
+ - sources:
153
+ - model: *MODEL
154
+ layer_range: [48, 50]
155
+ <<: *scale_filter_env
156
+ - sources:
157
+ - model: *MODEL
158
+ layer_range: [49, 51]
159
+ <<: *scale_filter_env
160
+ - sources:
161
+ - model: *MODEL
162
+ layer_range: [50, 52]
163
+ <<: *scale_filter_env
164
+ - sources:
165
+ - model: *MODEL
166
+ layer_range: [51, 53]
167
+ <<: *scale_filter_env
168
+ - sources:
169
+ - model: *MODEL
170
+ layer_range: [52, 54]
171
+ <<: *scale_filter_env
172
+ - sources:
173
+ - model: *MODEL
174
+ layer_range: [53, 55]
175
+ <<: *scale_filter_env
176
+
177
+ - sources:
178
+ - model: *MODEL
179
+ layer_range: [54, 55]
180
+ <<: *scale_filter_env
181
+ - sources:
182
+ - model: *MODEL
183
+ layer_range: [55, 80]
184
+
185
+ merge_method: passthrough
186
+ dtype: float16
187
+ ```