riczhou commited on
Commit
c4f4b03
·
verified ·
1 Parent(s): ba3ed11

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
ndarray-cache.json ADDED
@@ -0,0 +1,2169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 147,
4
+ "ParamBytes": 3778220032.0,
5
+ "BitsPerParam": 16.0
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 67108864,
12
+ "records": [
13
+ {
14
+ "name": "model.layers.0.feed_forward.gate_up_proj.weight",
15
+ "shape": [
16
+ 16384,
17
+ 2048
18
+ ],
19
+ "dtype": "float16",
20
+ "format": "f32-to-bf16",
21
+ "nbytes": 67108864,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "13620811167b805f8ea2a1d99fbd406c"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 33554432,
31
+ "records": [
32
+ {
33
+ "name": "model.layers.0.feed_forward.w2.weight",
34
+ "shape": [
35
+ 2048,
36
+ 8192
37
+ ],
38
+ "dtype": "float16",
39
+ "format": "f32-to-bf16",
40
+ "nbytes": 33554432,
41
+ "byteOffset": 0
42
+ }
43
+ ],
44
+ "md5sum": "562607cf49908cc655f991e0d55bcfab"
45
+ },
46
+ {
47
+ "dataPath": "params_shard_2.bin",
48
+ "format": "raw-shard",
49
+ "nbytes": 25174016,
50
+ "records": [
51
+ {
52
+ "name": "model.layers.0.attention.wo.weight",
53
+ "shape": [
54
+ 2048,
55
+ 2048
56
+ ],
57
+ "dtype": "float16",
58
+ "format": "f32-to-bf16",
59
+ "nbytes": 8388608,
60
+ "byteOffset": 0
61
+ },
62
+ {
63
+ "name": "model.layers.0.attention.wqkv.weight",
64
+ "shape": [
65
+ 4096,
66
+ 2048
67
+ ],
68
+ "dtype": "float16",
69
+ "format": "f32-to-bf16",
70
+ "nbytes": 16777216,
71
+ "byteOffset": 8388608
72
+ },
73
+ {
74
+ "name": "model.layers.0.attention_norm.weight",
75
+ "shape": [
76
+ 2048
77
+ ],
78
+ "dtype": "float16",
79
+ "format": "f32-to-bf16",
80
+ "nbytes": 4096,
81
+ "byteOffset": 25165824
82
+ },
83
+ {
84
+ "name": "model.layers.0.ffn_norm.weight",
85
+ "shape": [
86
+ 2048
87
+ ],
88
+ "dtype": "float16",
89
+ "format": "f32-to-bf16",
90
+ "nbytes": 4096,
91
+ "byteOffset": 25169920
92
+ }
93
+ ],
94
+ "md5sum": "cdef617556087435d51959e9e9ed610e"
95
+ },
96
+ {
97
+ "dataPath": "params_shard_3.bin",
98
+ "format": "raw-shard",
99
+ "nbytes": 67108864,
100
+ "records": [
101
+ {
102
+ "name": "model.layers.1.feed_forward.gate_up_proj.weight",
103
+ "shape": [
104
+ 16384,
105
+ 2048
106
+ ],
107
+ "dtype": "float16",
108
+ "format": "f32-to-bf16",
109
+ "nbytes": 67108864,
110
+ "byteOffset": 0
111
+ }
112
+ ],
113
+ "md5sum": "439c2ca4b319f35142265f6f82c5d302"
114
+ },
115
+ {
116
+ "dataPath": "params_shard_4.bin",
117
+ "format": "raw-shard",
118
+ "nbytes": 33554432,
119
+ "records": [
120
+ {
121
+ "name": "model.layers.1.feed_forward.w2.weight",
122
+ "shape": [
123
+ 2048,
124
+ 8192
125
+ ],
126
+ "dtype": "float16",
127
+ "format": "f32-to-bf16",
128
+ "nbytes": 33554432,
129
+ "byteOffset": 0
130
+ }
131
+ ],
132
+ "md5sum": "7d65f1da58141a06abbe4b13a0940dee"
133
+ },
134
+ {
135
+ "dataPath": "params_shard_5.bin",
136
+ "format": "raw-shard",
137
+ "nbytes": 25174016,
138
+ "records": [
139
+ {
140
+ "name": "model.layers.1.attention.wo.weight",
141
+ "shape": [
142
+ 2048,
143
+ 2048
144
+ ],
145
+ "dtype": "float16",
146
+ "format": "f32-to-bf16",
147
+ "nbytes": 8388608,
148
+ "byteOffset": 0
149
+ },
150
+ {
151
+ "name": "model.layers.1.attention.wqkv.weight",
152
+ "shape": [
153
+ 4096,
154
+ 2048
155
+ ],
156
+ "dtype": "float16",
157
+ "format": "f32-to-bf16",
158
+ "nbytes": 16777216,
159
+ "byteOffset": 8388608
160
+ },
161
+ {
162
+ "name": "model.layers.1.attention_norm.weight",
163
+ "shape": [
164
+ 2048
165
+ ],
166
+ "dtype": "float16",
167
+ "format": "f32-to-bf16",
168
+ "nbytes": 4096,
169
+ "byteOffset": 25165824
170
+ },
171
+ {
172
+ "name": "model.layers.1.ffn_norm.weight",
173
+ "shape": [
174
+ 2048
175
+ ],
176
+ "dtype": "float16",
177
+ "format": "f32-to-bf16",
178
+ "nbytes": 4096,
179
+ "byteOffset": 25169920
180
+ }
181
+ ],
182
+ "md5sum": "c43d939dc697ae883fa557c3a6acebfd"
183
+ },
184
+ {
185
+ "dataPath": "params_shard_6.bin",
186
+ "format": "raw-shard",
187
+ "nbytes": 67108864,
188
+ "records": [
189
+ {
190
+ "name": "model.layers.10.feed_forward.gate_up_proj.weight",
191
+ "shape": [
192
+ 16384,
193
+ 2048
194
+ ],
195
+ "dtype": "float16",
196
+ "format": "f32-to-bf16",
197
+ "nbytes": 67108864,
198
+ "byteOffset": 0
199
+ }
200
+ ],
201
+ "md5sum": "a3897ce16b64b2952d540145eb9a0285"
202
+ },
203
+ {
204
+ "dataPath": "params_shard_7.bin",
205
+ "format": "raw-shard",
206
+ "nbytes": 33554432,
207
+ "records": [
208
+ {
209
+ "name": "model.layers.10.feed_forward.w2.weight",
210
+ "shape": [
211
+ 2048,
212
+ 8192
213
+ ],
214
+ "dtype": "float16",
215
+ "format": "f32-to-bf16",
216
+ "nbytes": 33554432,
217
+ "byteOffset": 0
218
+ }
219
+ ],
220
+ "md5sum": "7feea0d894294bebc16db150da47e608"
221
+ },
222
+ {
223
+ "dataPath": "params_shard_8.bin",
224
+ "format": "raw-shard",
225
+ "nbytes": 25174016,
226
+ "records": [
227
+ {
228
+ "name": "model.layers.10.attention.wo.weight",
229
+ "shape": [
230
+ 2048,
231
+ 2048
232
+ ],
233
+ "dtype": "float16",
234
+ "format": "f32-to-bf16",
235
+ "nbytes": 8388608,
236
+ "byteOffset": 0
237
+ },
238
+ {
239
+ "name": "model.layers.10.attention.wqkv.weight",
240
+ "shape": [
241
+ 4096,
242
+ 2048
243
+ ],
244
+ "dtype": "float16",
245
+ "format": "f32-to-bf16",
246
+ "nbytes": 16777216,
247
+ "byteOffset": 8388608
248
+ },
249
+ {
250
+ "name": "model.layers.10.attention_norm.weight",
251
+ "shape": [
252
+ 2048
253
+ ],
254
+ "dtype": "float16",
255
+ "format": "f32-to-bf16",
256
+ "nbytes": 4096,
257
+ "byteOffset": 25165824
258
+ },
259
+ {
260
+ "name": "model.layers.10.ffn_norm.weight",
261
+ "shape": [
262
+ 2048
263
+ ],
264
+ "dtype": "float16",
265
+ "format": "f32-to-bf16",
266
+ "nbytes": 4096,
267
+ "byteOffset": 25169920
268
+ }
269
+ ],
270
+ "md5sum": "06fd9fccc1cc527bdcdc0604ac15d86f"
271
+ },
272
+ {
273
+ "dataPath": "params_shard_9.bin",
274
+ "format": "raw-shard",
275
+ "nbytes": 67108864,
276
+ "records": [
277
+ {
278
+ "name": "model.layers.11.feed_forward.gate_up_proj.weight",
279
+ "shape": [
280
+ 16384,
281
+ 2048
282
+ ],
283
+ "dtype": "float16",
284
+ "format": "f32-to-bf16",
285
+ "nbytes": 67108864,
286
+ "byteOffset": 0
287
+ }
288
+ ],
289
+ "md5sum": "9078eec7783ca112e36c3a2684e1a867"
290
+ },
291
+ {
292
+ "dataPath": "params_shard_10.bin",
293
+ "format": "raw-shard",
294
+ "nbytes": 33554432,
295
+ "records": [
296
+ {
297
+ "name": "model.layers.11.feed_forward.w2.weight",
298
+ "shape": [
299
+ 2048,
300
+ 8192
301
+ ],
302
+ "dtype": "float16",
303
+ "format": "f32-to-bf16",
304
+ "nbytes": 33554432,
305
+ "byteOffset": 0
306
+ }
307
+ ],
308
+ "md5sum": "a6fa373eb1a0079b814769b897f0d07c"
309
+ },
310
+ {
311
+ "dataPath": "params_shard_11.bin",
312
+ "format": "raw-shard",
313
+ "nbytes": 25174016,
314
+ "records": [
315
+ {
316
+ "name": "model.layers.11.attention.wo.weight",
317
+ "shape": [
318
+ 2048,
319
+ 2048
320
+ ],
321
+ "dtype": "float16",
322
+ "format": "f32-to-bf16",
323
+ "nbytes": 8388608,
324
+ "byteOffset": 0
325
+ },
326
+ {
327
+ "name": "model.layers.11.attention.wqkv.weight",
328
+ "shape": [
329
+ 4096,
330
+ 2048
331
+ ],
332
+ "dtype": "float16",
333
+ "format": "f32-to-bf16",
334
+ "nbytes": 16777216,
335
+ "byteOffset": 8388608
336
+ },
337
+ {
338
+ "name": "model.layers.11.attention_norm.weight",
339
+ "shape": [
340
+ 2048
341
+ ],
342
+ "dtype": "float16",
343
+ "format": "f32-to-bf16",
344
+ "nbytes": 4096,
345
+ "byteOffset": 25165824
346
+ },
347
+ {
348
+ "name": "model.layers.11.ffn_norm.weight",
349
+ "shape": [
350
+ 2048
351
+ ],
352
+ "dtype": "float16",
353
+ "format": "f32-to-bf16",
354
+ "nbytes": 4096,
355
+ "byteOffset": 25169920
356
+ }
357
+ ],
358
+ "md5sum": "55fbc26500896d2e10b7e7db5c83c0a5"
359
+ },
360
+ {
361
+ "dataPath": "params_shard_12.bin",
362
+ "format": "raw-shard",
363
+ "nbytes": 67108864,
364
+ "records": [
365
+ {
366
+ "name": "model.layers.12.feed_forward.gate_up_proj.weight",
367
+ "shape": [
368
+ 16384,
369
+ 2048
370
+ ],
371
+ "dtype": "float16",
372
+ "format": "f32-to-bf16",
373
+ "nbytes": 67108864,
374
+ "byteOffset": 0
375
+ }
376
+ ],
377
+ "md5sum": "065d4381d53a30b79b19a0179d9c4a45"
378
+ },
379
+ {
380
+ "dataPath": "params_shard_13.bin",
381
+ "format": "raw-shard",
382
+ "nbytes": 25165824,
383
+ "records": [
384
+ {
385
+ "name": "model.layers.12.attention.wo.weight",
386
+ "shape": [
387
+ 2048,
388
+ 2048
389
+ ],
390
+ "dtype": "float16",
391
+ "format": "f32-to-bf16",
392
+ "nbytes": 8388608,
393
+ "byteOffset": 0
394
+ },
395
+ {
396
+ "name": "model.layers.12.attention.wqkv.weight",
397
+ "shape": [
398
+ 4096,
399
+ 2048
400
+ ],
401
+ "dtype": "float16",
402
+ "format": "f32-to-bf16",
403
+ "nbytes": 16777216,
404
+ "byteOffset": 8388608
405
+ }
406
+ ],
407
+ "md5sum": "ebf9b884f5693236428d94869c969adc"
408
+ },
409
+ {
410
+ "dataPath": "params_shard_14.bin",
411
+ "format": "raw-shard",
412
+ "nbytes": 67108864,
413
+ "records": [
414
+ {
415
+ "name": "model.layers.2.feed_forward.gate_up_proj.weight",
416
+ "shape": [
417
+ 16384,
418
+ 2048
419
+ ],
420
+ "dtype": "float16",
421
+ "format": "f32-to-bf16",
422
+ "nbytes": 67108864,
423
+ "byteOffset": 0
424
+ }
425
+ ],
426
+ "md5sum": "b7613846d39ce5f62b2f1dd30a9746a6"
427
+ },
428
+ {
429
+ "dataPath": "params_shard_15.bin",
430
+ "format": "raw-shard",
431
+ "nbytes": 33554432,
432
+ "records": [
433
+ {
434
+ "name": "model.layers.2.feed_forward.w2.weight",
435
+ "shape": [
436
+ 2048,
437
+ 8192
438
+ ],
439
+ "dtype": "float16",
440
+ "format": "f32-to-bf16",
441
+ "nbytes": 33554432,
442
+ "byteOffset": 0
443
+ }
444
+ ],
445
+ "md5sum": "0a194ceac00e1e59af2d7127b3c8cf2b"
446
+ },
447
+ {
448
+ "dataPath": "params_shard_16.bin",
449
+ "format": "raw-shard",
450
+ "nbytes": 25174016,
451
+ "records": [
452
+ {
453
+ "name": "model.layers.2.attention.wo.weight",
454
+ "shape": [
455
+ 2048,
456
+ 2048
457
+ ],
458
+ "dtype": "float16",
459
+ "format": "f32-to-bf16",
460
+ "nbytes": 8388608,
461
+ "byteOffset": 0
462
+ },
463
+ {
464
+ "name": "model.layers.2.attention.wqkv.weight",
465
+ "shape": [
466
+ 4096,
467
+ 2048
468
+ ],
469
+ "dtype": "float16",
470
+ "format": "f32-to-bf16",
471
+ "nbytes": 16777216,
472
+ "byteOffset": 8388608
473
+ },
474
+ {
475
+ "name": "model.layers.2.attention_norm.weight",
476
+ "shape": [
477
+ 2048
478
+ ],
479
+ "dtype": "float16",
480
+ "format": "f32-to-bf16",
481
+ "nbytes": 4096,
482
+ "byteOffset": 25165824
483
+ },
484
+ {
485
+ "name": "model.layers.2.ffn_norm.weight",
486
+ "shape": [
487
+ 2048
488
+ ],
489
+ "dtype": "float16",
490
+ "format": "f32-to-bf16",
491
+ "nbytes": 4096,
492
+ "byteOffset": 25169920
493
+ }
494
+ ],
495
+ "md5sum": "133438c2ed5e05aa25d9b0722be6ea20"
496
+ },
497
+ {
498
+ "dataPath": "params_shard_17.bin",
499
+ "format": "raw-shard",
500
+ "nbytes": 67108864,
501
+ "records": [
502
+ {
503
+ "name": "model.layers.3.feed_forward.gate_up_proj.weight",
504
+ "shape": [
505
+ 16384,
506
+ 2048
507
+ ],
508
+ "dtype": "float16",
509
+ "format": "f32-to-bf16",
510
+ "nbytes": 67108864,
511
+ "byteOffset": 0
512
+ }
513
+ ],
514
+ "md5sum": "424adead5dbf5c309417525896bc5577"
515
+ },
516
+ {
517
+ "dataPath": "params_shard_18.bin",
518
+ "format": "raw-shard",
519
+ "nbytes": 33554432,
520
+ "records": [
521
+ {
522
+ "name": "model.layers.3.feed_forward.w2.weight",
523
+ "shape": [
524
+ 2048,
525
+ 8192
526
+ ],
527
+ "dtype": "float16",
528
+ "format": "f32-to-bf16",
529
+ "nbytes": 33554432,
530
+ "byteOffset": 0
531
+ }
532
+ ],
533
+ "md5sum": "fbf2d36b86b2c09a93a4a1c9840dd781"
534
+ },
535
+ {
536
+ "dataPath": "params_shard_19.bin",
537
+ "format": "raw-shard",
538
+ "nbytes": 25174016,
539
+ "records": [
540
+ {
541
+ "name": "model.layers.3.attention.wo.weight",
542
+ "shape": [
543
+ 2048,
544
+ 2048
545
+ ],
546
+ "dtype": "float16",
547
+ "format": "f32-to-bf16",
548
+ "nbytes": 8388608,
549
+ "byteOffset": 0
550
+ },
551
+ {
552
+ "name": "model.layers.3.attention.wqkv.weight",
553
+ "shape": [
554
+ 4096,
555
+ 2048
556
+ ],
557
+ "dtype": "float16",
558
+ "format": "f32-to-bf16",
559
+ "nbytes": 16777216,
560
+ "byteOffset": 8388608
561
+ },
562
+ {
563
+ "name": "model.layers.3.attention_norm.weight",
564
+ "shape": [
565
+ 2048
566
+ ],
567
+ "dtype": "float16",
568
+ "format": "f32-to-bf16",
569
+ "nbytes": 4096,
570
+ "byteOffset": 25165824
571
+ },
572
+ {
573
+ "name": "model.layers.3.ffn_norm.weight",
574
+ "shape": [
575
+ 2048
576
+ ],
577
+ "dtype": "float16",
578
+ "format": "f32-to-bf16",
579
+ "nbytes": 4096,
580
+ "byteOffset": 25169920
581
+ }
582
+ ],
583
+ "md5sum": "9841633cf8016dc1eb4bc2961a3f94aa"
584
+ },
585
+ {
586
+ "dataPath": "params_shard_20.bin",
587
+ "format": "raw-shard",
588
+ "nbytes": 67108864,
589
+ "records": [
590
+ {
591
+ "name": "model.layers.4.feed_forward.gate_up_proj.weight",
592
+ "shape": [
593
+ 16384,
594
+ 2048
595
+ ],
596
+ "dtype": "float16",
597
+ "format": "f32-to-bf16",
598
+ "nbytes": 67108864,
599
+ "byteOffset": 0
600
+ }
601
+ ],
602
+ "md5sum": "47f542e9709957aa8af72faa952d486f"
603
+ },
604
+ {
605
+ "dataPath": "params_shard_21.bin",
606
+ "format": "raw-shard",
607
+ "nbytes": 33554432,
608
+ "records": [
609
+ {
610
+ "name": "model.layers.4.feed_forward.w2.weight",
611
+ "shape": [
612
+ 2048,
613
+ 8192
614
+ ],
615
+ "dtype": "float16",
616
+ "format": "f32-to-bf16",
617
+ "nbytes": 33554432,
618
+ "byteOffset": 0
619
+ }
620
+ ],
621
+ "md5sum": "de5cc4de49eb2939d747f591db5fd452"
622
+ },
623
+ {
624
+ "dataPath": "params_shard_22.bin",
625
+ "format": "raw-shard",
626
+ "nbytes": 25174016,
627
+ "records": [
628
+ {
629
+ "name": "model.layers.4.attention.wo.weight",
630
+ "shape": [
631
+ 2048,
632
+ 2048
633
+ ],
634
+ "dtype": "float16",
635
+ "format": "f32-to-bf16",
636
+ "nbytes": 8388608,
637
+ "byteOffset": 0
638
+ },
639
+ {
640
+ "name": "model.layers.4.attention.wqkv.weight",
641
+ "shape": [
642
+ 4096,
643
+ 2048
644
+ ],
645
+ "dtype": "float16",
646
+ "format": "f32-to-bf16",
647
+ "nbytes": 16777216,
648
+ "byteOffset": 8388608
649
+ },
650
+ {
651
+ "name": "model.layers.4.attention_norm.weight",
652
+ "shape": [
653
+ 2048
654
+ ],
655
+ "dtype": "float16",
656
+ "format": "f32-to-bf16",
657
+ "nbytes": 4096,
658
+ "byteOffset": 25165824
659
+ },
660
+ {
661
+ "name": "model.layers.4.ffn_norm.weight",
662
+ "shape": [
663
+ 2048
664
+ ],
665
+ "dtype": "float16",
666
+ "format": "f32-to-bf16",
667
+ "nbytes": 4096,
668
+ "byteOffset": 25169920
669
+ }
670
+ ],
671
+ "md5sum": "e8a0aced0a7edbb64191d23f342bb81d"
672
+ },
673
+ {
674
+ "dataPath": "params_shard_23.bin",
675
+ "format": "raw-shard",
676
+ "nbytes": 67108864,
677
+ "records": [
678
+ {
679
+ "name": "model.layers.5.feed_forward.gate_up_proj.weight",
680
+ "shape": [
681
+ 16384,
682
+ 2048
683
+ ],
684
+ "dtype": "float16",
685
+ "format": "f32-to-bf16",
686
+ "nbytes": 67108864,
687
+ "byteOffset": 0
688
+ }
689
+ ],
690
+ "md5sum": "8f5e3eca294b217de661233d3904c69d"
691
+ },
692
+ {
693
+ "dataPath": "params_shard_24.bin",
694
+ "format": "raw-shard",
695
+ "nbytes": 33554432,
696
+ "records": [
697
+ {
698
+ "name": "model.layers.5.feed_forward.w2.weight",
699
+ "shape": [
700
+ 2048,
701
+ 8192
702
+ ],
703
+ "dtype": "float16",
704
+ "format": "f32-to-bf16",
705
+ "nbytes": 33554432,
706
+ "byteOffset": 0
707
+ }
708
+ ],
709
+ "md5sum": "f6f3b1f080fbc31c85636b8b542c1cc4"
710
+ },
711
+ {
712
+ "dataPath": "params_shard_25.bin",
713
+ "format": "raw-shard",
714
+ "nbytes": 25174016,
715
+ "records": [
716
+ {
717
+ "name": "model.layers.5.attention.wo.weight",
718
+ "shape": [
719
+ 2048,
720
+ 2048
721
+ ],
722
+ "dtype": "float16",
723
+ "format": "f32-to-bf16",
724
+ "nbytes": 8388608,
725
+ "byteOffset": 0
726
+ },
727
+ {
728
+ "name": "model.layers.5.attention.wqkv.weight",
729
+ "shape": [
730
+ 4096,
731
+ 2048
732
+ ],
733
+ "dtype": "float16",
734
+ "format": "f32-to-bf16",
735
+ "nbytes": 16777216,
736
+ "byteOffset": 8388608
737
+ },
738
+ {
739
+ "name": "model.layers.5.attention_norm.weight",
740
+ "shape": [
741
+ 2048
742
+ ],
743
+ "dtype": "float16",
744
+ "format": "f32-to-bf16",
745
+ "nbytes": 4096,
746
+ "byteOffset": 25165824
747
+ },
748
+ {
749
+ "name": "model.layers.5.ffn_norm.weight",
750
+ "shape": [
751
+ 2048
752
+ ],
753
+ "dtype": "float16",
754
+ "format": "f32-to-bf16",
755
+ "nbytes": 4096,
756
+ "byteOffset": 25169920
757
+ }
758
+ ],
759
+ "md5sum": "ec22329c2b39d676c61b4f7a13ff1955"
760
+ },
761
+ {
762
+ "dataPath": "params_shard_26.bin",
763
+ "format": "raw-shard",
764
+ "nbytes": 67108864,
765
+ "records": [
766
+ {
767
+ "name": "model.layers.6.feed_forward.gate_up_proj.weight",
768
+ "shape": [
769
+ 16384,
770
+ 2048
771
+ ],
772
+ "dtype": "float16",
773
+ "format": "f32-to-bf16",
774
+ "nbytes": 67108864,
775
+ "byteOffset": 0
776
+ }
777
+ ],
778
+ "md5sum": "6aa83a5244c843e8abdf7eaa4bc5c3d2"
779
+ },
780
+ {
781
+ "dataPath": "params_shard_27.bin",
782
+ "format": "raw-shard",
783
+ "nbytes": 33554432,
784
+ "records": [
785
+ {
786
+ "name": "model.layers.6.feed_forward.w2.weight",
787
+ "shape": [
788
+ 2048,
789
+ 8192
790
+ ],
791
+ "dtype": "float16",
792
+ "format": "f32-to-bf16",
793
+ "nbytes": 33554432,
794
+ "byteOffset": 0
795
+ }
796
+ ],
797
+ "md5sum": "9c46bb790229cdd669d265e0ed1b1bdc"
798
+ },
799
+ {
800
+ "dataPath": "params_shard_28.bin",
801
+ "format": "raw-shard",
802
+ "nbytes": 25174016,
803
+ "records": [
804
+ {
805
+ "name": "model.layers.6.attention.wo.weight",
806
+ "shape": [
807
+ 2048,
808
+ 2048
809
+ ],
810
+ "dtype": "float16",
811
+ "format": "f32-to-bf16",
812
+ "nbytes": 8388608,
813
+ "byteOffset": 0
814
+ },
815
+ {
816
+ "name": "model.layers.6.attention.wqkv.weight",
817
+ "shape": [
818
+ 4096,
819
+ 2048
820
+ ],
821
+ "dtype": "float16",
822
+ "format": "f32-to-bf16",
823
+ "nbytes": 16777216,
824
+ "byteOffset": 8388608
825
+ },
826
+ {
827
+ "name": "model.layers.6.attention_norm.weight",
828
+ "shape": [
829
+ 2048
830
+ ],
831
+ "dtype": "float16",
832
+ "format": "f32-to-bf16",
833
+ "nbytes": 4096,
834
+ "byteOffset": 25165824
835
+ },
836
+ {
837
+ "name": "model.layers.6.ffn_norm.weight",
838
+ "shape": [
839
+ 2048
840
+ ],
841
+ "dtype": "float16",
842
+ "format": "f32-to-bf16",
843
+ "nbytes": 4096,
844
+ "byteOffset": 25169920
845
+ }
846
+ ],
847
+ "md5sum": "39521843a30d216fd8e46c677bc07eee"
848
+ },
849
+ {
850
+ "dataPath": "params_shard_29.bin",
851
+ "format": "raw-shard",
852
+ "nbytes": 67108864,
853
+ "records": [
854
+ {
855
+ "name": "model.layers.7.feed_forward.gate_up_proj.weight",
856
+ "shape": [
857
+ 16384,
858
+ 2048
859
+ ],
860
+ "dtype": "float16",
861
+ "format": "f32-to-bf16",
862
+ "nbytes": 67108864,
863
+ "byteOffset": 0
864
+ }
865
+ ],
866
+ "md5sum": "79bf5afe398fcdad7933e47b6fb5b76c"
867
+ },
868
+ {
869
+ "dataPath": "params_shard_30.bin",
870
+ "format": "raw-shard",
871
+ "nbytes": 33554432,
872
+ "records": [
873
+ {
874
+ "name": "model.layers.7.feed_forward.w2.weight",
875
+ "shape": [
876
+ 2048,
877
+ 8192
878
+ ],
879
+ "dtype": "float16",
880
+ "format": "f32-to-bf16",
881
+ "nbytes": 33554432,
882
+ "byteOffset": 0
883
+ }
884
+ ],
885
+ "md5sum": "6d53d9f9d79dc2d8e80e0bc1307df84d"
886
+ },
887
+ {
888
+ "dataPath": "params_shard_31.bin",
889
+ "format": "raw-shard",
890
+ "nbytes": 25174016,
891
+ "records": [
892
+ {
893
+ "name": "model.layers.7.attention.wo.weight",
894
+ "shape": [
895
+ 2048,
896
+ 2048
897
+ ],
898
+ "dtype": "float16",
899
+ "format": "f32-to-bf16",
900
+ "nbytes": 8388608,
901
+ "byteOffset": 0
902
+ },
903
+ {
904
+ "name": "model.layers.7.attention.wqkv.weight",
905
+ "shape": [
906
+ 4096,
907
+ 2048
908
+ ],
909
+ "dtype": "float16",
910
+ "format": "f32-to-bf16",
911
+ "nbytes": 16777216,
912
+ "byteOffset": 8388608
913
+ },
914
+ {
915
+ "name": "model.layers.7.attention_norm.weight",
916
+ "shape": [
917
+ 2048
918
+ ],
919
+ "dtype": "float16",
920
+ "format": "f32-to-bf16",
921
+ "nbytes": 4096,
922
+ "byteOffset": 25165824
923
+ },
924
+ {
925
+ "name": "model.layers.7.ffn_norm.weight",
926
+ "shape": [
927
+ 2048
928
+ ],
929
+ "dtype": "float16",
930
+ "format": "f32-to-bf16",
931
+ "nbytes": 4096,
932
+ "byteOffset": 25169920
933
+ }
934
+ ],
935
+ "md5sum": "579d79fff40956baa246862fe41f8f68"
936
+ },
937
+ {
938
+ "dataPath": "params_shard_32.bin",
939
+ "format": "raw-shard",
940
+ "nbytes": 67108864,
941
+ "records": [
942
+ {
943
+ "name": "model.layers.8.feed_forward.gate_up_proj.weight",
944
+ "shape": [
945
+ 16384,
946
+ 2048
947
+ ],
948
+ "dtype": "float16",
949
+ "format": "f32-to-bf16",
950
+ "nbytes": 67108864,
951
+ "byteOffset": 0
952
+ }
953
+ ],
954
+ "md5sum": "872c26bf62854b30b5ceca2b0b2e133f"
955
+ },
956
+ {
957
+ "dataPath": "params_shard_33.bin",
958
+ "format": "raw-shard",
959
+ "nbytes": 33554432,
960
+ "records": [
961
+ {
962
+ "name": "model.layers.8.feed_forward.w2.weight",
963
+ "shape": [
964
+ 2048,
965
+ 8192
966
+ ],
967
+ "dtype": "float16",
968
+ "format": "f32-to-bf16",
969
+ "nbytes": 33554432,
970
+ "byteOffset": 0
971
+ }
972
+ ],
973
+ "md5sum": "9d9ce49d43fe40d22fbfea0b6189f699"
974
+ },
975
+ {
976
+ "dataPath": "params_shard_34.bin",
977
+ "format": "raw-shard",
978
+ "nbytes": 25174016,
979
+ "records": [
980
+ {
981
+ "name": "model.layers.8.attention.wo.weight",
982
+ "shape": [
983
+ 2048,
984
+ 2048
985
+ ],
986
+ "dtype": "float16",
987
+ "format": "f32-to-bf16",
988
+ "nbytes": 8388608,
989
+ "byteOffset": 0
990
+ },
991
+ {
992
+ "name": "model.layers.8.attention.wqkv.weight",
993
+ "shape": [
994
+ 4096,
995
+ 2048
996
+ ],
997
+ "dtype": "float16",
998
+ "format": "f32-to-bf16",
999
+ "nbytes": 16777216,
1000
+ "byteOffset": 8388608
1001
+ },
1002
+ {
1003
+ "name": "model.layers.8.attention_norm.weight",
1004
+ "shape": [
1005
+ 2048
1006
+ ],
1007
+ "dtype": "float16",
1008
+ "format": "f32-to-bf16",
1009
+ "nbytes": 4096,
1010
+ "byteOffset": 25165824
1011
+ },
1012
+ {
1013
+ "name": "model.layers.8.ffn_norm.weight",
1014
+ "shape": [
1015
+ 2048
1016
+ ],
1017
+ "dtype": "float16",
1018
+ "format": "f32-to-bf16",
1019
+ "nbytes": 4096,
1020
+ "byteOffset": 25169920
1021
+ }
1022
+ ],
1023
+ "md5sum": "371871bcf96a27dfa0110a0c1adaca7a"
1024
+ },
1025
+ {
1026
+ "dataPath": "params_shard_35.bin",
1027
+ "format": "raw-shard",
1028
+ "nbytes": 67108864,
1029
+ "records": [
1030
+ {
1031
+ "name": "model.layers.9.feed_forward.gate_up_proj.weight",
1032
+ "shape": [
1033
+ 16384,
1034
+ 2048
1035
+ ],
1036
+ "dtype": "float16",
1037
+ "format": "f32-to-bf16",
1038
+ "nbytes": 67108864,
1039
+ "byteOffset": 0
1040
+ }
1041
+ ],
1042
+ "md5sum": "1e200f5c650647936f0e504ed468e200"
1043
+ },
1044
+ {
1045
+ "dataPath": "params_shard_36.bin",
1046
+ "format": "raw-shard",
1047
+ "nbytes": 33554432,
1048
+ "records": [
1049
+ {
1050
+ "name": "model.layers.9.feed_forward.w2.weight",
1051
+ "shape": [
1052
+ 2048,
1053
+ 8192
1054
+ ],
1055
+ "dtype": "float16",
1056
+ "format": "f32-to-bf16",
1057
+ "nbytes": 33554432,
1058
+ "byteOffset": 0
1059
+ }
1060
+ ],
1061
+ "md5sum": "64f3c7ab0cef1b04cbbf507f585c4f1e"
1062
+ },
1063
+ {
1064
+ "dataPath": "params_shard_37.bin",
1065
+ "format": "raw-shard",
1066
+ "nbytes": 379060224,
1067
+ "records": [
1068
+ {
1069
+ "name": "model.tok_embeddings.weight",
1070
+ "shape": [
1071
+ 92544,
1072
+ 2048
1073
+ ],
1074
+ "dtype": "float16",
1075
+ "format": "f32-to-bf16",
1076
+ "nbytes": 379060224,
1077
+ "byteOffset": 0
1078
+ }
1079
+ ],
1080
+ "md5sum": "22a60526a41a1ec740a36e930ab628ea"
1081
+ },
1082
+ {
1083
+ "dataPath": "params_shard_38.bin",
1084
+ "format": "raw-shard",
1085
+ "nbytes": 33554432,
1086
+ "records": [
1087
+ {
1088
+ "name": "model.layers.12.feed_forward.w2.weight",
1089
+ "shape": [
1090
+ 2048,
1091
+ 8192
1092
+ ],
1093
+ "dtype": "float16",
1094
+ "format": "f32-to-bf16",
1095
+ "nbytes": 33554432,
1096
+ "byteOffset": 0
1097
+ }
1098
+ ],
1099
+ "md5sum": "d67ebba07546dfac1203b260fc1d96c4"
1100
+ },
1101
+ {
1102
+ "dataPath": "params_shard_39.bin",
1103
+ "format": "raw-shard",
1104
+ "nbytes": 25182208,
1105
+ "records": [
1106
+ {
1107
+ "name": "model.layers.9.attention.wo.weight",
1108
+ "shape": [
1109
+ 2048,
1110
+ 2048
1111
+ ],
1112
+ "dtype": "float16",
1113
+ "format": "f32-to-bf16",
1114
+ "nbytes": 8388608,
1115
+ "byteOffset": 0
1116
+ },
1117
+ {
1118
+ "name": "model.layers.9.attention.wqkv.weight",
1119
+ "shape": [
1120
+ 4096,
1121
+ 2048
1122
+ ],
1123
+ "dtype": "float16",
1124
+ "format": "f32-to-bf16",
1125
+ "nbytes": 16777216,
1126
+ "byteOffset": 8388608
1127
+ },
1128
+ {
1129
+ "name": "model.layers.9.attention_norm.weight",
1130
+ "shape": [
1131
+ 2048
1132
+ ],
1133
+ "dtype": "float16",
1134
+ "format": "f32-to-bf16",
1135
+ "nbytes": 4096,
1136
+ "byteOffset": 25165824
1137
+ },
1138
+ {
1139
+ "name": "model.layers.9.ffn_norm.weight",
1140
+ "shape": [
1141
+ 2048
1142
+ ],
1143
+ "dtype": "float16",
1144
+ "format": "f32-to-bf16",
1145
+ "nbytes": 4096,
1146
+ "byteOffset": 25169920
1147
+ },
1148
+ {
1149
+ "name": "model.layers.12.attention_norm.weight",
1150
+ "shape": [
1151
+ 2048
1152
+ ],
1153
+ "dtype": "float16",
1154
+ "format": "f32-to-bf16",
1155
+ "nbytes": 4096,
1156
+ "byteOffset": 25174016
1157
+ },
1158
+ {
1159
+ "name": "model.layers.12.ffn_norm.weight",
1160
+ "shape": [
1161
+ 2048
1162
+ ],
1163
+ "dtype": "float16",
1164
+ "format": "f32-to-bf16",
1165
+ "nbytes": 4096,
1166
+ "byteOffset": 25178112
1167
+ }
1168
+ ],
1169
+ "md5sum": "fea60b0b5b39f57a50b5f79fd5db77f5"
1170
+ },
1171
+ {
1172
+ "dataPath": "params_shard_40.bin",
1173
+ "format": "raw-shard",
1174
+ "nbytes": 67108864,
1175
+ "records": [
1176
+ {
1177
+ "name": "model.layers.13.feed_forward.gate_up_proj.weight",
1178
+ "shape": [
1179
+ 16384,
1180
+ 2048
1181
+ ],
1182
+ "dtype": "float16",
1183
+ "format": "f32-to-bf16",
1184
+ "nbytes": 67108864,
1185
+ "byteOffset": 0
1186
+ }
1187
+ ],
1188
+ "md5sum": "7336040e0332518e23a109a77622c95f"
1189
+ },
1190
+ {
1191
+ "dataPath": "params_shard_41.bin",
1192
+ "format": "raw-shard",
1193
+ "nbytes": 33554432,
1194
+ "records": [
1195
+ {
1196
+ "name": "model.layers.13.feed_forward.w2.weight",
1197
+ "shape": [
1198
+ 2048,
1199
+ 8192
1200
+ ],
1201
+ "dtype": "float16",
1202
+ "format": "f32-to-bf16",
1203
+ "nbytes": 33554432,
1204
+ "byteOffset": 0
1205
+ }
1206
+ ],
1207
+ "md5sum": "1bd18ed0fe66988af08bd9a3678123f6"
1208
+ },
1209
+ {
1210
+ "dataPath": "params_shard_42.bin",
1211
+ "format": "raw-shard",
1212
+ "nbytes": 25174016,
1213
+ "records": [
1214
+ {
1215
+ "name": "model.layers.13.attention.wo.weight",
1216
+ "shape": [
1217
+ 2048,
1218
+ 2048
1219
+ ],
1220
+ "dtype": "float16",
1221
+ "format": "f32-to-bf16",
1222
+ "nbytes": 8388608,
1223
+ "byteOffset": 0
1224
+ },
1225
+ {
1226
+ "name": "model.layers.13.attention.wqkv.weight",
1227
+ "shape": [
1228
+ 4096,
1229
+ 2048
1230
+ ],
1231
+ "dtype": "float16",
1232
+ "format": "f32-to-bf16",
1233
+ "nbytes": 16777216,
1234
+ "byteOffset": 8388608
1235
+ },
1236
+ {
1237
+ "name": "model.layers.13.attention_norm.weight",
1238
+ "shape": [
1239
+ 2048
1240
+ ],
1241
+ "dtype": "float16",
1242
+ "format": "f32-to-bf16",
1243
+ "nbytes": 4096,
1244
+ "byteOffset": 25165824
1245
+ },
1246
+ {
1247
+ "name": "model.layers.13.ffn_norm.weight",
1248
+ "shape": [
1249
+ 2048
1250
+ ],
1251
+ "dtype": "float16",
1252
+ "format": "f32-to-bf16",
1253
+ "nbytes": 4096,
1254
+ "byteOffset": 25169920
1255
+ }
1256
+ ],
1257
+ "md5sum": "81fc643e24286d745d9d3e9db68fcce7"
1258
+ },
1259
+ {
1260
+ "dataPath": "params_shard_43.bin",
1261
+ "format": "raw-shard",
1262
+ "nbytes": 67108864,
1263
+ "records": [
1264
+ {
1265
+ "name": "model.layers.14.feed_forward.gate_up_proj.weight",
1266
+ "shape": [
1267
+ 16384,
1268
+ 2048
1269
+ ],
1270
+ "dtype": "float16",
1271
+ "format": "f32-to-bf16",
1272
+ "nbytes": 67108864,
1273
+ "byteOffset": 0
1274
+ }
1275
+ ],
1276
+ "md5sum": "71c1416f7db0cba954b133615bff9451"
1277
+ },
1278
+ {
1279
+ "dataPath": "params_shard_44.bin",
1280
+ "format": "raw-shard",
1281
+ "nbytes": 33554432,
1282
+ "records": [
1283
+ {
1284
+ "name": "model.layers.14.feed_forward.w2.weight",
1285
+ "shape": [
1286
+ 2048,
1287
+ 8192
1288
+ ],
1289
+ "dtype": "float16",
1290
+ "format": "f32-to-bf16",
1291
+ "nbytes": 33554432,
1292
+ "byteOffset": 0
1293
+ }
1294
+ ],
1295
+ "md5sum": "dfa4ae9fb4158eca19611a07d3b249a5"
1296
+ },
1297
+ {
1298
+ "dataPath": "params_shard_45.bin",
1299
+ "format": "raw-shard",
1300
+ "nbytes": 25174016,
1301
+ "records": [
1302
+ {
1303
+ "name": "model.layers.14.attention.wo.weight",
1304
+ "shape": [
1305
+ 2048,
1306
+ 2048
1307
+ ],
1308
+ "dtype": "float16",
1309
+ "format": "f32-to-bf16",
1310
+ "nbytes": 8388608,
1311
+ "byteOffset": 0
1312
+ },
1313
+ {
1314
+ "name": "model.layers.14.attention.wqkv.weight",
1315
+ "shape": [
1316
+ 4096,
1317
+ 2048
1318
+ ],
1319
+ "dtype": "float16",
1320
+ "format": "f32-to-bf16",
1321
+ "nbytes": 16777216,
1322
+ "byteOffset": 8388608
1323
+ },
1324
+ {
1325
+ "name": "model.layers.14.attention_norm.weight",
1326
+ "shape": [
1327
+ 2048
1328
+ ],
1329
+ "dtype": "float16",
1330
+ "format": "f32-to-bf16",
1331
+ "nbytes": 4096,
1332
+ "byteOffset": 25165824
1333
+ },
1334
+ {
1335
+ "name": "model.layers.14.ffn_norm.weight",
1336
+ "shape": [
1337
+ 2048
1338
+ ],
1339
+ "dtype": "float16",
1340
+ "format": "f32-to-bf16",
1341
+ "nbytes": 4096,
1342
+ "byteOffset": 25169920
1343
+ }
1344
+ ],
1345
+ "md5sum": "88e34648f33e99e5d74a34a259ba97a0"
1346
+ },
1347
+ {
1348
+ "dataPath": "params_shard_46.bin",
1349
+ "format": "raw-shard",
1350
+ "nbytes": 67108864,
1351
+ "records": [
1352
+ {
1353
+ "name": "model.layers.15.feed_forward.gate_up_proj.weight",
1354
+ "shape": [
1355
+ 16384,
1356
+ 2048
1357
+ ],
1358
+ "dtype": "float16",
1359
+ "format": "f32-to-bf16",
1360
+ "nbytes": 67108864,
1361
+ "byteOffset": 0
1362
+ }
1363
+ ],
1364
+ "md5sum": "3678f7b6adcb837bb5471013aa8a677e"
1365
+ },
1366
+ {
1367
+ "dataPath": "params_shard_47.bin",
1368
+ "format": "raw-shard",
1369
+ "nbytes": 33554432,
1370
+ "records": [
1371
+ {
1372
+ "name": "model.layers.15.feed_forward.w2.weight",
1373
+ "shape": [
1374
+ 2048,
1375
+ 8192
1376
+ ],
1377
+ "dtype": "float16",
1378
+ "format": "f32-to-bf16",
1379
+ "nbytes": 33554432,
1380
+ "byteOffset": 0
1381
+ }
1382
+ ],
1383
+ "md5sum": "5fcd3ce8a74f83581106483947f5a39f"
1384
+ },
1385
+ {
1386
+ "dataPath": "params_shard_48.bin",
1387
+ "format": "raw-shard",
1388
+ "nbytes": 25174016,
1389
+ "records": [
1390
+ {
1391
+ "name": "model.layers.15.attention.wo.weight",
1392
+ "shape": [
1393
+ 2048,
1394
+ 2048
1395
+ ],
1396
+ "dtype": "float16",
1397
+ "format": "f32-to-bf16",
1398
+ "nbytes": 8388608,
1399
+ "byteOffset": 0
1400
+ },
1401
+ {
1402
+ "name": "model.layers.15.attention.wqkv.weight",
1403
+ "shape": [
1404
+ 4096,
1405
+ 2048
1406
+ ],
1407
+ "dtype": "float16",
1408
+ "format": "f32-to-bf16",
1409
+ "nbytes": 16777216,
1410
+ "byteOffset": 8388608
1411
+ },
1412
+ {
1413
+ "name": "model.layers.15.attention_norm.weight",
1414
+ "shape": [
1415
+ 2048
1416
+ ],
1417
+ "dtype": "float16",
1418
+ "format": "f32-to-bf16",
1419
+ "nbytes": 4096,
1420
+ "byteOffset": 25165824
1421
+ },
1422
+ {
1423
+ "name": "model.layers.15.ffn_norm.weight",
1424
+ "shape": [
1425
+ 2048
1426
+ ],
1427
+ "dtype": "float16",
1428
+ "format": "f32-to-bf16",
1429
+ "nbytes": 4096,
1430
+ "byteOffset": 25169920
1431
+ }
1432
+ ],
1433
+ "md5sum": "5d3f1ee1d9118aeb8d3a4a6dbd1d1a80"
1434
+ },
1435
+ {
1436
+ "dataPath": "params_shard_49.bin",
1437
+ "format": "raw-shard",
1438
+ "nbytes": 67108864,
1439
+ "records": [
1440
+ {
1441
+ "name": "model.layers.16.feed_forward.gate_up_proj.weight",
1442
+ "shape": [
1443
+ 16384,
1444
+ 2048
1445
+ ],
1446
+ "dtype": "float16",
1447
+ "format": "f32-to-bf16",
1448
+ "nbytes": 67108864,
1449
+ "byteOffset": 0
1450
+ }
1451
+ ],
1452
+ "md5sum": "f030e01a341f26c32ef40cf8e2e56671"
1453
+ },
1454
+ {
1455
+ "dataPath": "params_shard_50.bin",
1456
+ "format": "raw-shard",
1457
+ "nbytes": 33554432,
1458
+ "records": [
1459
+ {
1460
+ "name": "model.layers.16.feed_forward.w2.weight",
1461
+ "shape": [
1462
+ 2048,
1463
+ 8192
1464
+ ],
1465
+ "dtype": "float16",
1466
+ "format": "f32-to-bf16",
1467
+ "nbytes": 33554432,
1468
+ "byteOffset": 0
1469
+ }
1470
+ ],
1471
+ "md5sum": "aca438bd5c0a97326bc9cb4f40ba127c"
1472
+ },
1473
+ {
1474
+ "dataPath": "params_shard_51.bin",
1475
+ "format": "raw-shard",
1476
+ "nbytes": 25174016,
1477
+ "records": [
1478
+ {
1479
+ "name": "model.layers.16.attention.wo.weight",
1480
+ "shape": [
1481
+ 2048,
1482
+ 2048
1483
+ ],
1484
+ "dtype": "float16",
1485
+ "format": "f32-to-bf16",
1486
+ "nbytes": 8388608,
1487
+ "byteOffset": 0
1488
+ },
1489
+ {
1490
+ "name": "model.layers.16.attention.wqkv.weight",
1491
+ "shape": [
1492
+ 4096,
1493
+ 2048
1494
+ ],
1495
+ "dtype": "float16",
1496
+ "format": "f32-to-bf16",
1497
+ "nbytes": 16777216,
1498
+ "byteOffset": 8388608
1499
+ },
1500
+ {
1501
+ "name": "model.layers.16.attention_norm.weight",
1502
+ "shape": [
1503
+ 2048
1504
+ ],
1505
+ "dtype": "float16",
1506
+ "format": "f32-to-bf16",
1507
+ "nbytes": 4096,
1508
+ "byteOffset": 25165824
1509
+ },
1510
+ {
1511
+ "name": "model.layers.16.ffn_norm.weight",
1512
+ "shape": [
1513
+ 2048
1514
+ ],
1515
+ "dtype": "float16",
1516
+ "format": "f32-to-bf16",
1517
+ "nbytes": 4096,
1518
+ "byteOffset": 25169920
1519
+ }
1520
+ ],
1521
+ "md5sum": "f1e82431d8dc0005139f5c588055e091"
1522
+ },
1523
+ {
1524
+ "dataPath": "params_shard_52.bin",
1525
+ "format": "raw-shard",
1526
+ "nbytes": 67108864,
1527
+ "records": [
1528
+ {
1529
+ "name": "model.layers.17.feed_forward.gate_up_proj.weight",
1530
+ "shape": [
1531
+ 16384,
1532
+ 2048
1533
+ ],
1534
+ "dtype": "float16",
1535
+ "format": "f32-to-bf16",
1536
+ "nbytes": 67108864,
1537
+ "byteOffset": 0
1538
+ }
1539
+ ],
1540
+ "md5sum": "acdaf4671c25f9d0968de5dd7f328fa0"
1541
+ },
1542
+ {
1543
+ "dataPath": "params_shard_53.bin",
1544
+ "format": "raw-shard",
1545
+ "nbytes": 33554432,
1546
+ "records": [
1547
+ {
1548
+ "name": "model.layers.17.feed_forward.w2.weight",
1549
+ "shape": [
1550
+ 2048,
1551
+ 8192
1552
+ ],
1553
+ "dtype": "float16",
1554
+ "format": "f32-to-bf16",
1555
+ "nbytes": 33554432,
1556
+ "byteOffset": 0
1557
+ }
1558
+ ],
1559
+ "md5sum": "2f98fa680df4cd9396b080e39ba7ac4b"
1560
+ },
1561
+ {
1562
+ "dataPath": "params_shard_54.bin",
1563
+ "format": "raw-shard",
1564
+ "nbytes": 25174016,
1565
+ "records": [
1566
+ {
1567
+ "name": "model.layers.17.attention.wo.weight",
1568
+ "shape": [
1569
+ 2048,
1570
+ 2048
1571
+ ],
1572
+ "dtype": "float16",
1573
+ "format": "f32-to-bf16",
1574
+ "nbytes": 8388608,
1575
+ "byteOffset": 0
1576
+ },
1577
+ {
1578
+ "name": "model.layers.17.attention.wqkv.weight",
1579
+ "shape": [
1580
+ 4096,
1581
+ 2048
1582
+ ],
1583
+ "dtype": "float16",
1584
+ "format": "f32-to-bf16",
1585
+ "nbytes": 16777216,
1586
+ "byteOffset": 8388608
1587
+ },
1588
+ {
1589
+ "name": "model.layers.17.attention_norm.weight",
1590
+ "shape": [
1591
+ 2048
1592
+ ],
1593
+ "dtype": "float16",
1594
+ "format": "f32-to-bf16",
1595
+ "nbytes": 4096,
1596
+ "byteOffset": 25165824
1597
+ },
1598
+ {
1599
+ "name": "model.layers.17.ffn_norm.weight",
1600
+ "shape": [
1601
+ 2048
1602
+ ],
1603
+ "dtype": "float16",
1604
+ "format": "f32-to-bf16",
1605
+ "nbytes": 4096,
1606
+ "byteOffset": 25169920
1607
+ }
1608
+ ],
1609
+ "md5sum": "1e1fdcea5051f6d03ba5d07a6e05ad10"
1610
+ },
1611
+ {
1612
+ "dataPath": "params_shard_55.bin",
1613
+ "format": "raw-shard",
1614
+ "nbytes": 67108864,
1615
+ "records": [
1616
+ {
1617
+ "name": "model.layers.18.feed_forward.gate_up_proj.weight",
1618
+ "shape": [
1619
+ 16384,
1620
+ 2048
1621
+ ],
1622
+ "dtype": "float16",
1623
+ "format": "f32-to-bf16",
1624
+ "nbytes": 67108864,
1625
+ "byteOffset": 0
1626
+ }
1627
+ ],
1628
+ "md5sum": "e9439edabf330d64ea237c65ddfc0e7c"
1629
+ },
1630
+ {
1631
+ "dataPath": "params_shard_56.bin",
1632
+ "format": "raw-shard",
1633
+ "nbytes": 33554432,
1634
+ "records": [
1635
+ {
1636
+ "name": "model.layers.18.feed_forward.w2.weight",
1637
+ "shape": [
1638
+ 2048,
1639
+ 8192
1640
+ ],
1641
+ "dtype": "float16",
1642
+ "format": "f32-to-bf16",
1643
+ "nbytes": 33554432,
1644
+ "byteOffset": 0
1645
+ }
1646
+ ],
1647
+ "md5sum": "08447849f75fe55e869edb54075b9684"
1648
+ },
1649
+ {
1650
+ "dataPath": "params_shard_57.bin",
1651
+ "format": "raw-shard",
1652
+ "nbytes": 25174016,
1653
+ "records": [
1654
+ {
1655
+ "name": "model.layers.18.attention.wo.weight",
1656
+ "shape": [
1657
+ 2048,
1658
+ 2048
1659
+ ],
1660
+ "dtype": "float16",
1661
+ "format": "f32-to-bf16",
1662
+ "nbytes": 8388608,
1663
+ "byteOffset": 0
1664
+ },
1665
+ {
1666
+ "name": "model.layers.18.attention.wqkv.weight",
1667
+ "shape": [
1668
+ 4096,
1669
+ 2048
1670
+ ],
1671
+ "dtype": "float16",
1672
+ "format": "f32-to-bf16",
1673
+ "nbytes": 16777216,
1674
+ "byteOffset": 8388608
1675
+ },
1676
+ {
1677
+ "name": "model.layers.18.attention_norm.weight",
1678
+ "shape": [
1679
+ 2048
1680
+ ],
1681
+ "dtype": "float16",
1682
+ "format": "f32-to-bf16",
1683
+ "nbytes": 4096,
1684
+ "byteOffset": 25165824
1685
+ },
1686
+ {
1687
+ "name": "model.layers.18.ffn_norm.weight",
1688
+ "shape": [
1689
+ 2048
1690
+ ],
1691
+ "dtype": "float16",
1692
+ "format": "f32-to-bf16",
1693
+ "nbytes": 4096,
1694
+ "byteOffset": 25169920
1695
+ }
1696
+ ],
1697
+ "md5sum": "8235ed3a474bada5c929bfe4a1a1b998"
1698
+ },
1699
+ {
1700
+ "dataPath": "params_shard_58.bin",
1701
+ "format": "raw-shard",
1702
+ "nbytes": 67108864,
1703
+ "records": [
1704
+ {
1705
+ "name": "model.layers.19.feed_forward.gate_up_proj.weight",
1706
+ "shape": [
1707
+ 16384,
1708
+ 2048
1709
+ ],
1710
+ "dtype": "float16",
1711
+ "format": "f32-to-bf16",
1712
+ "nbytes": 67108864,
1713
+ "byteOffset": 0
1714
+ }
1715
+ ],
1716
+ "md5sum": "b0125004e88f868ac4ead47da62eee9c"
1717
+ },
1718
+ {
1719
+ "dataPath": "params_shard_59.bin",
1720
+ "format": "raw-shard",
1721
+ "nbytes": 33554432,
1722
+ "records": [
1723
+ {
1724
+ "name": "model.layers.19.feed_forward.w2.weight",
1725
+ "shape": [
1726
+ 2048,
1727
+ 8192
1728
+ ],
1729
+ "dtype": "float16",
1730
+ "format": "f32-to-bf16",
1731
+ "nbytes": 33554432,
1732
+ "byteOffset": 0
1733
+ }
1734
+ ],
1735
+ "md5sum": "871462f1f4a3befbf8e843f144da9a97"
1736
+ },
1737
+ {
1738
+ "dataPath": "params_shard_60.bin",
1739
+ "format": "raw-shard",
1740
+ "nbytes": 25174016,
1741
+ "records": [
1742
+ {
1743
+ "name": "model.layers.19.attention.wo.weight",
1744
+ "shape": [
1745
+ 2048,
1746
+ 2048
1747
+ ],
1748
+ "dtype": "float16",
1749
+ "format": "f32-to-bf16",
1750
+ "nbytes": 8388608,
1751
+ "byteOffset": 0
1752
+ },
1753
+ {
1754
+ "name": "model.layers.19.attention.wqkv.weight",
1755
+ "shape": [
1756
+ 4096,
1757
+ 2048
1758
+ ],
1759
+ "dtype": "float16",
1760
+ "format": "f32-to-bf16",
1761
+ "nbytes": 16777216,
1762
+ "byteOffset": 8388608
1763
+ },
1764
+ {
1765
+ "name": "model.layers.19.attention_norm.weight",
1766
+ "shape": [
1767
+ 2048
1768
+ ],
1769
+ "dtype": "float16",
1770
+ "format": "f32-to-bf16",
1771
+ "nbytes": 4096,
1772
+ "byteOffset": 25165824
1773
+ },
1774
+ {
1775
+ "name": "model.layers.19.ffn_norm.weight",
1776
+ "shape": [
1777
+ 2048
1778
+ ],
1779
+ "dtype": "float16",
1780
+ "format": "f32-to-bf16",
1781
+ "nbytes": 4096,
1782
+ "byteOffset": 25169920
1783
+ }
1784
+ ],
1785
+ "md5sum": "18f6ca2e5e8140418016df65d6f70dd8"
1786
+ },
1787
+ {
1788
+ "dataPath": "params_shard_61.bin",
1789
+ "format": "raw-shard",
1790
+ "nbytes": 67108864,
1791
+ "records": [
1792
+ {
1793
+ "name": "model.layers.20.feed_forward.gate_up_proj.weight",
1794
+ "shape": [
1795
+ 16384,
1796
+ 2048
1797
+ ],
1798
+ "dtype": "float16",
1799
+ "format": "f32-to-bf16",
1800
+ "nbytes": 67108864,
1801
+ "byteOffset": 0
1802
+ }
1803
+ ],
1804
+ "md5sum": "d1b94d7fd217eec8644f1b82d0925bb3"
1805
+ },
1806
+ {
1807
+ "dataPath": "params_shard_62.bin",
1808
+ "format": "raw-shard",
1809
+ "nbytes": 33554432,
1810
+ "records": [
1811
+ {
1812
+ "name": "model.layers.20.feed_forward.w2.weight",
1813
+ "shape": [
1814
+ 2048,
1815
+ 8192
1816
+ ],
1817
+ "dtype": "float16",
1818
+ "format": "f32-to-bf16",
1819
+ "nbytes": 33554432,
1820
+ "byteOffset": 0
1821
+ }
1822
+ ],
1823
+ "md5sum": "ef05f26662406aaa151ee96f14560d3b"
1824
+ },
1825
+ {
1826
+ "dataPath": "params_shard_63.bin",
1827
+ "format": "raw-shard",
1828
+ "nbytes": 25174016,
1829
+ "records": [
1830
+ {
1831
+ "name": "model.layers.20.attention.wo.weight",
1832
+ "shape": [
1833
+ 2048,
1834
+ 2048
1835
+ ],
1836
+ "dtype": "float16",
1837
+ "format": "f32-to-bf16",
1838
+ "nbytes": 8388608,
1839
+ "byteOffset": 0
1840
+ },
1841
+ {
1842
+ "name": "model.layers.20.attention.wqkv.weight",
1843
+ "shape": [
1844
+ 4096,
1845
+ 2048
1846
+ ],
1847
+ "dtype": "float16",
1848
+ "format": "f32-to-bf16",
1849
+ "nbytes": 16777216,
1850
+ "byteOffset": 8388608
1851
+ },
1852
+ {
1853
+ "name": "model.layers.20.attention_norm.weight",
1854
+ "shape": [
1855
+ 2048
1856
+ ],
1857
+ "dtype": "float16",
1858
+ "format": "f32-to-bf16",
1859
+ "nbytes": 4096,
1860
+ "byteOffset": 25165824
1861
+ },
1862
+ {
1863
+ "name": "model.layers.20.ffn_norm.weight",
1864
+ "shape": [
1865
+ 2048
1866
+ ],
1867
+ "dtype": "float16",
1868
+ "format": "f32-to-bf16",
1869
+ "nbytes": 4096,
1870
+ "byteOffset": 25169920
1871
+ }
1872
+ ],
1873
+ "md5sum": "1bb51ea71c1ab50a84ab2f95c2c63175"
1874
+ },
1875
+ {
1876
+ "dataPath": "params_shard_64.bin",
1877
+ "format": "raw-shard",
1878
+ "nbytes": 67108864,
1879
+ "records": [
1880
+ {
1881
+ "name": "model.layers.21.feed_forward.gate_up_proj.weight",
1882
+ "shape": [
1883
+ 16384,
1884
+ 2048
1885
+ ],
1886
+ "dtype": "float16",
1887
+ "format": "f32-to-bf16",
1888
+ "nbytes": 67108864,
1889
+ "byteOffset": 0
1890
+ }
1891
+ ],
1892
+ "md5sum": "c30634533c2fc59d575615fa4ac9214b"
1893
+ },
1894
+ {
1895
+ "dataPath": "params_shard_65.bin",
1896
+ "format": "raw-shard",
1897
+ "nbytes": 33554432,
1898
+ "records": [
1899
+ {
1900
+ "name": "model.layers.21.feed_forward.w2.weight",
1901
+ "shape": [
1902
+ 2048,
1903
+ 8192
1904
+ ],
1905
+ "dtype": "float16",
1906
+ "format": "f32-to-bf16",
1907
+ "nbytes": 33554432,
1908
+ "byteOffset": 0
1909
+ }
1910
+ ],
1911
+ "md5sum": "c123a4af918d13adf4a76d66c23f8500"
1912
+ },
1913
+ {
1914
+ "dataPath": "params_shard_66.bin",
1915
+ "format": "raw-shard",
1916
+ "nbytes": 25174016,
1917
+ "records": [
1918
+ {
1919
+ "name": "model.layers.21.attention.wo.weight",
1920
+ "shape": [
1921
+ 2048,
1922
+ 2048
1923
+ ],
1924
+ "dtype": "float16",
1925
+ "format": "f32-to-bf16",
1926
+ "nbytes": 8388608,
1927
+ "byteOffset": 0
1928
+ },
1929
+ {
1930
+ "name": "model.layers.21.attention.wqkv.weight",
1931
+ "shape": [
1932
+ 4096,
1933
+ 2048
1934
+ ],
1935
+ "dtype": "float16",
1936
+ "format": "f32-to-bf16",
1937
+ "nbytes": 16777216,
1938
+ "byteOffset": 8388608
1939
+ },
1940
+ {
1941
+ "name": "model.layers.21.attention_norm.weight",
1942
+ "shape": [
1943
+ 2048
1944
+ ],
1945
+ "dtype": "float16",
1946
+ "format": "f32-to-bf16",
1947
+ "nbytes": 4096,
1948
+ "byteOffset": 25165824
1949
+ },
1950
+ {
1951
+ "name": "model.layers.21.ffn_norm.weight",
1952
+ "shape": [
1953
+ 2048
1954
+ ],
1955
+ "dtype": "float16",
1956
+ "format": "f32-to-bf16",
1957
+ "nbytes": 4096,
1958
+ "byteOffset": 25169920
1959
+ }
1960
+ ],
1961
+ "md5sum": "6554be6c98a0dc256082e9152d1f0e21"
1962
+ },
1963
+ {
1964
+ "dataPath": "params_shard_67.bin",
1965
+ "format": "raw-shard",
1966
+ "nbytes": 67108864,
1967
+ "records": [
1968
+ {
1969
+ "name": "model.layers.22.feed_forward.gate_up_proj.weight",
1970
+ "shape": [
1971
+ 16384,
1972
+ 2048
1973
+ ],
1974
+ "dtype": "float16",
1975
+ "format": "f32-to-bf16",
1976
+ "nbytes": 67108864,
1977
+ "byteOffset": 0
1978
+ }
1979
+ ],
1980
+ "md5sum": "2afc907981b853485b4978d87f9024ca"
1981
+ },
1982
+ {
1983
+ "dataPath": "params_shard_68.bin",
1984
+ "format": "raw-shard",
1985
+ "nbytes": 33554432,
1986
+ "records": [
1987
+ {
1988
+ "name": "model.layers.22.feed_forward.w2.weight",
1989
+ "shape": [
1990
+ 2048,
1991
+ 8192
1992
+ ],
1993
+ "dtype": "float16",
1994
+ "format": "f32-to-bf16",
1995
+ "nbytes": 33554432,
1996
+ "byteOffset": 0
1997
+ }
1998
+ ],
1999
+ "md5sum": "fe186966e79790cc22bbeefbc75af8dd"
2000
+ },
2001
+ {
2002
+ "dataPath": "params_shard_69.bin",
2003
+ "format": "raw-shard",
2004
+ "nbytes": 25174016,
2005
+ "records": [
2006
+ {
2007
+ "name": "model.layers.22.attention.wo.weight",
2008
+ "shape": [
2009
+ 2048,
2010
+ 2048
2011
+ ],
2012
+ "dtype": "float16",
2013
+ "format": "f32-to-bf16",
2014
+ "nbytes": 8388608,
2015
+ "byteOffset": 0
2016
+ },
2017
+ {
2018
+ "name": "model.layers.22.attention.wqkv.weight",
2019
+ "shape": [
2020
+ 4096,
2021
+ 2048
2022
+ ],
2023
+ "dtype": "float16",
2024
+ "format": "f32-to-bf16",
2025
+ "nbytes": 16777216,
2026
+ "byteOffset": 8388608
2027
+ },
2028
+ {
2029
+ "name": "model.layers.22.attention_norm.weight",
2030
+ "shape": [
2031
+ 2048
2032
+ ],
2033
+ "dtype": "float16",
2034
+ "format": "f32-to-bf16",
2035
+ "nbytes": 4096,
2036
+ "byteOffset": 25165824
2037
+ },
2038
+ {
2039
+ "name": "model.layers.22.ffn_norm.weight",
2040
+ "shape": [
2041
+ 2048
2042
+ ],
2043
+ "dtype": "float16",
2044
+ "format": "f32-to-bf16",
2045
+ "nbytes": 4096,
2046
+ "byteOffset": 25169920
2047
+ }
2048
+ ],
2049
+ "md5sum": "4c76fe74edbf769084ab1171ed00266f"
2050
+ },
2051
+ {
2052
+ "dataPath": "params_shard_70.bin",
2053
+ "format": "raw-shard",
2054
+ "nbytes": 67108864,
2055
+ "records": [
2056
+ {
2057
+ "name": "model.layers.23.feed_forward.gate_up_proj.weight",
2058
+ "shape": [
2059
+ 16384,
2060
+ 2048
2061
+ ],
2062
+ "dtype": "float16",
2063
+ "format": "f32-to-bf16",
2064
+ "nbytes": 67108864,
2065
+ "byteOffset": 0
2066
+ }
2067
+ ],
2068
+ "md5sum": "6440fb6e14258b7f4dea6dc77b8262a9"
2069
+ },
2070
+ {
2071
+ "dataPath": "params_shard_71.bin",
2072
+ "format": "raw-shard",
2073
+ "nbytes": 33554432,
2074
+ "records": [
2075
+ {
2076
+ "name": "model.layers.23.feed_forward.w2.weight",
2077
+ "shape": [
2078
+ 2048,
2079
+ 8192
2080
+ ],
2081
+ "dtype": "float16",
2082
+ "format": "f32-to-bf16",
2083
+ "nbytes": 33554432,
2084
+ "byteOffset": 0
2085
+ }
2086
+ ],
2087
+ "md5sum": "cc5c095b532c915b19d5718c3e6bfe4e"
2088
+ },
2089
+ {
2090
+ "dataPath": "params_shard_72.bin",
2091
+ "format": "raw-shard",
2092
+ "nbytes": 379060224,
2093
+ "records": [
2094
+ {
2095
+ "name": "output.weight",
2096
+ "shape": [
2097
+ 92544,
2098
+ 2048
2099
+ ],
2100
+ "dtype": "float16",
2101
+ "format": "f32-to-bf16",
2102
+ "nbytes": 379060224,
2103
+ "byteOffset": 0
2104
+ }
2105
+ ],
2106
+ "md5sum": "a5b778eef7ec6da1957b35e1888c3333"
2107
+ },
2108
+ {
2109
+ "dataPath": "params_shard_73.bin",
2110
+ "format": "raw-shard",
2111
+ "nbytes": 25178112,
2112
+ "records": [
2113
+ {
2114
+ "name": "model.layers.23.attention.wo.weight",
2115
+ "shape": [
2116
+ 2048,
2117
+ 2048
2118
+ ],
2119
+ "dtype": "float16",
2120
+ "format": "f32-to-bf16",
2121
+ "nbytes": 8388608,
2122
+ "byteOffset": 0
2123
+ },
2124
+ {
2125
+ "name": "model.layers.23.attention.wqkv.weight",
2126
+ "shape": [
2127
+ 4096,
2128
+ 2048
2129
+ ],
2130
+ "dtype": "float16",
2131
+ "format": "f32-to-bf16",
2132
+ "nbytes": 16777216,
2133
+ "byteOffset": 8388608
2134
+ },
2135
+ {
2136
+ "name": "model.layers.23.attention_norm.weight",
2137
+ "shape": [
2138
+ 2048
2139
+ ],
2140
+ "dtype": "float16",
2141
+ "format": "f32-to-bf16",
2142
+ "nbytes": 4096,
2143
+ "byteOffset": 25165824
2144
+ },
2145
+ {
2146
+ "name": "model.layers.23.ffn_norm.weight",
2147
+ "shape": [
2148
+ 2048
2149
+ ],
2150
+ "dtype": "float16",
2151
+ "format": "f32-to-bf16",
2152
+ "nbytes": 4096,
2153
+ "byteOffset": 25169920
2154
+ },
2155
+ {
2156
+ "name": "model.norm.weight",
2157
+ "shape": [
2158
+ 2048
2159
+ ],
2160
+ "dtype": "float16",
2161
+ "format": "f32-to-bf16",
2162
+ "nbytes": 4096,
2163
+ "byteOffset": 25174016
2164
+ }
2165
+ ],
2166
+ "md5sum": "5735df772a182e0db0a8ac4b6937351a"
2167
+ }
2168
+ ]
2169
+ }
params_shard_0.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:237ba5c6a242bd9dc24b2199ea3dc5c4c6653957e23dd66839a6ca85ed116666
3
+ size 67108864
params_shard_1.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:618df0e5dd1bed15ccf88ed9edf3d15d99b465267269265ce90357908ecbab69
3
+ size 33554432
params_shard_10.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e79c7845fc2d680fdcd38940688b5dd6a323be9e10bbfc82209e8ea87c4b1386
3
+ size 33554432
params_shard_11.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3e3d637ab413e4af30e33739e1f22d177222e9804f76229aa64bfdd5fd95890c
3
+ size 25174016
params_shard_12.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fc352860a516bee83ba6c774681a4e18f7eeb70690f762e08c9fcaae1a53f28a
3
+ size 67108864
params_shard_13.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4a0b8049d94df6811bd56428c4da7a258780378c8ab4ba578e74282ebda2234a
3
+ size 25165824
params_shard_14.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a669c057f3e261e6db82f25d249a7234c303fe3ca555a1b65ee10969d64402a
3
+ size 67108864
params_shard_15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7c62db176c45e2202e342e5622384d7a96bb81758a47a1eee4850a447647c1ad
3
+ size 33554432
params_shard_16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2bc7366076a82c9cff0a1d7102f31e39063c773eba5f12f05b0cad5955ad3f84
3
+ size 25174016
params_shard_17.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78a5a6c329d4a92efb3e4643c73536918ae223a375f97ee3fa33f2f7fd9dc4e7
3
+ size 67108864
params_shard_18.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:23a47bfbb559f8646e291555f8379561c94ab221be9d9676b59c00d9b76c1ab3
3
+ size 33554432
params_shard_19.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b70972045379eef92e6c6dbaabac7213784f1f574a244e458f0cdcf111165620
3
+ size 25174016
params_shard_2.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f00afc278dddf634607d6759bc6f4c1ff636e2ceda1bd38c4a605b95a1576f96
3
+ size 25174016
params_shard_20.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d2c88898e3142547f1c75c00b19ffee8429d53261e14021072590ee802ef3097
3
+ size 67108864
params_shard_21.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8bef2c41ebe4cd66263a8e201ee2ca9cfb36948beae394eb17d5d548b10c3c79
3
+ size 33554432
params_shard_22.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e2ccb6db323e0631b6918fbda6e3a979d28a9d1d9fd1f8caf8991c4709977cb2
3
+ size 25174016
params_shard_23.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d87fba2d35306ff63f5f9ca0f6aaf2926bdfce4918dbd4ec32373bb020989f9a
3
+ size 67108864
params_shard_24.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0299baf5ff3d1b8190585ac70146990eaaac95aaa47ac8d0760267c28973b0b3
3
+ size 33554432
params_shard_25.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ccf336e8a751624dbe36c6dfa8313ad820f40c96285e0e54874aa47a234ab9c5
3
+ size 25174016
params_shard_26.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68fe632c9b04106b137033230c37dc55a8418f8cb18474f8516637120a7f1386
3
+ size 67108864
params_shard_27.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b2ba9eebf1ac982c327486fb3f38dbabf2473c2ba05b2aef950cd24c30e30ce
3
+ size 33554432
params_shard_28.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:007c11472b25ad574e88ba1d424a105be818d5385d7e44922ffff595ba036865
3
+ size 25174016
params_shard_29.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:285d25492610f5772d935a0f26f0f20067a3bf50eb790e16f7b3ab321ec2df2a
3
+ size 67108864
params_shard_3.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba70bf4185d9785038d81311efcdee5042e4f25b98596a660fe52db58744d613
3
+ size 67108864
params_shard_30.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6db1defc1f45e979076de766578d66b236c03bb6ef2ec46527f05edd1e492a09
3
+ size 33554432
params_shard_31.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fcbe6ae02cc2fedee37633556bdd4cd3267abfbf638876955e3be61392016c50
3
+ size 25174016
params_shard_32.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca732ac7d9f0df46dcd7db57f6f993bf9b563e48cd342993af781ab0c3091846
3
+ size 67108864
params_shard_33.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a065b63c96b1f41ff816cb77328a149d36339e9e08bc46ebb9392b7e1e6d590f
3
+ size 33554432
params_shard_34.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51747e352ae607496185223b8ccfc6350b06cedce64f9014638d699ee2e2b386
3
+ size 25174016
params_shard_35.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8b3b97d0d58d8c30ece941fa485475344a3e4cf34f6b503637237864327b3f89
3
+ size 67108864
params_shard_36.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:415134e3e19e992a660aadc005a4f0c4abe30b4f996a34cf6ede065210178a38
3
+ size 33554432
params_shard_37.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ccf8b3c6eb5991144d0fe784338d06e35ddf9d1d5d43541c8cc55cd2851f1fe
3
+ size 379060224
params_shard_38.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7f81480594598be70e4c49edc858046970d6f62140db5263544d1581cf87c048
3
+ size 33554432
params_shard_39.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c88bb35d4639bf47916541df18f5f36088f09325ec79827631d0e45f735e7d18
3
+ size 25182208
params_shard_4.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:55fd42aad95da890f204f4bb5e1e7faa638aee758319c25d6426a90cc6fdf454
3
+ size 33554432
params_shard_40.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce3d7f3f1a2bf88e8f699cbaef2b86400339396b9b1fcc4bb9574fc0f75c2429
3
+ size 67108864
params_shard_41.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bd8d7c2bfe56571050aab8da20085c0aa7b7468be6e3ec33b7585b0735548e5e
3
+ size 33554432
params_shard_42.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de4a9ef14abc91b5f3e893976bd92b3d1ada14924781417a046ce23aa2d5a8de
3
+ size 25174016
params_shard_43.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dda7ddac0b9c18c86ab2d36b53917790c66a85587cf7a10e7dfa9a52497ea2c9
3
+ size 67108864
params_shard_44.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:83b54403026ebd092191076c7b21dd16a945defacc406406e7acbd5919fbc615
3
+ size 33554432
params_shard_45.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a802f391ce9a6c23002cd6cc44b5961fd868209736196e04184774f6134dd28
3
+ size 25174016
params_shard_46.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d27ff405b414c620810dbedb247f21060a41aeebae7710bb188404e5e4a1656
3
+ size 67108864
params_shard_47.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c75f041a6654e61be31ca5b362b1bb20ac0bb0d83504a8a348bb7f26874b0685
3
+ size 33554432
params_shard_48.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:359b3f4338eb2946b2eadb85e6dec2b7f0d7128f30530329a9eaf054596af2c9
3
+ size 25174016
params_shard_49.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c3f08be9ecf440c81aa88368b3bc4722ed0b225fc0ea77c8a0f059cf356f4b8
3
+ size 67108864
params_shard_5.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c499e4324278a6bd9e09fc2dbbe3ae438d4ea94d2aee221171f7299dde8a574e
3
+ size 25174016
params_shard_50.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:74baf6a52f324b0333d8161b736ed86d120b8e46ce539cf08d4ae306b1b7d4ca
3
+ size 33554432
params_shard_51.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8e8ea10ef7ecd1ec7202b988599d7700483efe5d0da8d3b5959e0c0a81ffa19
3
+ size 25174016
params_shard_52.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1c1d8866fb74b491eb5e357becf81aab58bfc373cbd2c943880c8916cda0801a
3
+ size 67108864