numen-tech commited on
Commit
1483955
·
1 Parent(s): 1fec38b

Add weights

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,18 @@
1
  ---
 
 
 
 
 
 
 
 
 
2
  license: llama3.2
 
 
 
 
3
  ---
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ - de
5
+ - fr
6
+ - it
7
+ - pt
8
+ - hi
9
+ - es
10
+ - th
11
  license: llama3.2
12
+ base_model: FuseAI/FuseChat-Llama-3.2-1B-Instruct
13
+ base_model_relation: quantized
14
+ library_name: mlc-llm
15
+ pipeline_tag: text-generation
16
  ---
17
+
18
+ Unquantized (fp16, the parent model is bf16) version of [FuseChat-Llama-3.2-1B-Instruct](https://huggingface.co/FuseAI/FuseChat-Llama-3.2-1B-Instruct) for inference with [Private LLM](http://privatellm.app).
config.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "quantization_config": {
3
+ "bits": 4
4
+ }
5
+ }
ndarray-cache.json ADDED
@@ -0,0 +1,1446 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "ParamSize": 98,
4
+ "ParamBytes": 2471628800.0,
5
+ "BitsPerParam": 16.0
6
+ },
7
+ "records": [
8
+ {
9
+ "dataPath": "params_shard_0.bin",
10
+ "format": "raw-shard",
11
+ "nbytes": 525336576,
12
+ "records": [
13
+ {
14
+ "name": "model.embed_tokens.weight",
15
+ "shape": [
16
+ 128256,
17
+ 2048
18
+ ],
19
+ "dtype": "float16",
20
+ "format": "f32-to-bf16",
21
+ "nbytes": 525336576,
22
+ "byteOffset": 0
23
+ }
24
+ ],
25
+ "md5sum": "96ecea76fb745a56ee7bc9a00606f3da"
26
+ },
27
+ {
28
+ "dataPath": "params_shard_1.bin",
29
+ "format": "raw-shard",
30
+ "nbytes": 33554432,
31
+ "records": [
32
+ {
33
+ "name": "model.layers.0.mlp.down_proj.weight",
34
+ "shape": [
35
+ 2048,
36
+ 8192
37
+ ],
38
+ "dtype": "float16",
39
+ "format": "f32-to-bf16",
40
+ "nbytes": 33554432,
41
+ "byteOffset": 0
42
+ }
43
+ ],
44
+ "md5sum": "45ab6a597571f55d11310ad502ecd11c"
45
+ },
46
+ {
47
+ "dataPath": "params_shard_2.bin",
48
+ "format": "raw-shard",
49
+ "nbytes": 67108864,
50
+ "records": [
51
+ {
52
+ "name": "model.layers.0.mlp.gate_up_proj.weight",
53
+ "shape": [
54
+ 16384,
55
+ 2048
56
+ ],
57
+ "dtype": "float16",
58
+ "format": "f32-to-bf16",
59
+ "nbytes": 67108864,
60
+ "byteOffset": 0
61
+ }
62
+ ],
63
+ "md5sum": "72068393e1767a06fd5d059251fefc72"
64
+ },
65
+ {
66
+ "dataPath": "params_shard_3.bin",
67
+ "format": "raw-shard",
68
+ "nbytes": 33554432,
69
+ "records": [
70
+ {
71
+ "name": "model.layers.1.mlp.down_proj.weight",
72
+ "shape": [
73
+ 2048,
74
+ 8192
75
+ ],
76
+ "dtype": "float16",
77
+ "format": "f32-to-bf16",
78
+ "nbytes": 33554432,
79
+ "byteOffset": 0
80
+ }
81
+ ],
82
+ "md5sum": "6fc65384fd5a6fb03c461e1022ea887f"
83
+ },
84
+ {
85
+ "dataPath": "params_shard_4.bin",
86
+ "format": "raw-shard",
87
+ "nbytes": 67108864,
88
+ "records": [
89
+ {
90
+ "name": "model.layers.1.mlp.gate_up_proj.weight",
91
+ "shape": [
92
+ 16384,
93
+ 2048
94
+ ],
95
+ "dtype": "float16",
96
+ "format": "f32-to-bf16",
97
+ "nbytes": 67108864,
98
+ "byteOffset": 0
99
+ }
100
+ ],
101
+ "md5sum": "cee36a8591282bcb9d842d0f32f1f04c"
102
+ },
103
+ {
104
+ "dataPath": "params_shard_5.bin",
105
+ "format": "raw-shard",
106
+ "nbytes": 20987904,
107
+ "records": [
108
+ {
109
+ "name": "model.layers.0.input_layernorm.weight",
110
+ "shape": [
111
+ 2048
112
+ ],
113
+ "dtype": "float16",
114
+ "format": "f32-to-bf16",
115
+ "nbytes": 4096,
116
+ "byteOffset": 0
117
+ },
118
+ {
119
+ "name": "model.layers.0.post_attention_layernorm.weight",
120
+ "shape": [
121
+ 2048
122
+ ],
123
+ "dtype": "float16",
124
+ "format": "f32-to-bf16",
125
+ "nbytes": 4096,
126
+ "byteOffset": 4096
127
+ },
128
+ {
129
+ "name": "model.layers.0.self_attn.qkv_proj.weight",
130
+ "shape": [
131
+ 3072,
132
+ 2048
133
+ ],
134
+ "dtype": "float16",
135
+ "format": "f32-to-bf16",
136
+ "nbytes": 12582912,
137
+ "byteOffset": 8192
138
+ },
139
+ {
140
+ "name": "model.layers.0.self_attn.o_proj.weight",
141
+ "shape": [
142
+ 2048,
143
+ 2048
144
+ ],
145
+ "dtype": "float16",
146
+ "format": "f32-to-bf16",
147
+ "nbytes": 8388608,
148
+ "byteOffset": 12591104
149
+ },
150
+ {
151
+ "name": "model.layers.1.input_layernorm.weight",
152
+ "shape": [
153
+ 2048
154
+ ],
155
+ "dtype": "float16",
156
+ "format": "f32-to-bf16",
157
+ "nbytes": 4096,
158
+ "byteOffset": 20979712
159
+ },
160
+ {
161
+ "name": "model.layers.1.post_attention_layernorm.weight",
162
+ "shape": [
163
+ 2048
164
+ ],
165
+ "dtype": "float16",
166
+ "format": "f32-to-bf16",
167
+ "nbytes": 4096,
168
+ "byteOffset": 20983808
169
+ }
170
+ ],
171
+ "md5sum": "2f7e6171ac0351f78163af8b1a6e4762"
172
+ },
173
+ {
174
+ "dataPath": "params_shard_6.bin",
175
+ "format": "raw-shard",
176
+ "nbytes": 33554432,
177
+ "records": [
178
+ {
179
+ "name": "model.layers.10.mlp.down_proj.weight",
180
+ "shape": [
181
+ 2048,
182
+ 8192
183
+ ],
184
+ "dtype": "float16",
185
+ "format": "f32-to-bf16",
186
+ "nbytes": 33554432,
187
+ "byteOffset": 0
188
+ }
189
+ ],
190
+ "md5sum": "ed36c50971738bdd08ae2ab1150820fd"
191
+ },
192
+ {
193
+ "dataPath": "params_shard_7.bin",
194
+ "format": "raw-shard",
195
+ "nbytes": 67108864,
196
+ "records": [
197
+ {
198
+ "name": "model.layers.10.mlp.gate_up_proj.weight",
199
+ "shape": [
200
+ 16384,
201
+ 2048
202
+ ],
203
+ "dtype": "float16",
204
+ "format": "f32-to-bf16",
205
+ "nbytes": 67108864,
206
+ "byteOffset": 0
207
+ }
208
+ ],
209
+ "md5sum": "47e9ad785e2c976074789c8b5d83ab5a"
210
+ },
211
+ {
212
+ "dataPath": "params_shard_8.bin",
213
+ "format": "raw-shard",
214
+ "nbytes": 20979712,
215
+ "records": [
216
+ {
217
+ "name": "model.layers.1.self_attn.qkv_proj.weight",
218
+ "shape": [
219
+ 3072,
220
+ 2048
221
+ ],
222
+ "dtype": "float16",
223
+ "format": "f32-to-bf16",
224
+ "nbytes": 12582912,
225
+ "byteOffset": 0
226
+ },
227
+ {
228
+ "name": "model.layers.1.self_attn.o_proj.weight",
229
+ "shape": [
230
+ 2048,
231
+ 2048
232
+ ],
233
+ "dtype": "float16",
234
+ "format": "f32-to-bf16",
235
+ "nbytes": 8388608,
236
+ "byteOffset": 12582912
237
+ },
238
+ {
239
+ "name": "model.layers.10.input_layernorm.weight",
240
+ "shape": [
241
+ 2048
242
+ ],
243
+ "dtype": "float16",
244
+ "format": "f32-to-bf16",
245
+ "nbytes": 4096,
246
+ "byteOffset": 20971520
247
+ },
248
+ {
249
+ "name": "model.layers.10.post_attention_layernorm.weight",
250
+ "shape": [
251
+ 2048
252
+ ],
253
+ "dtype": "float16",
254
+ "format": "f32-to-bf16",
255
+ "nbytes": 4096,
256
+ "byteOffset": 20975616
257
+ }
258
+ ],
259
+ "md5sum": "3063f6be2136e6ed98c1d73b1e0ccaab"
260
+ },
261
+ {
262
+ "dataPath": "params_shard_9.bin",
263
+ "format": "raw-shard",
264
+ "nbytes": 33554432,
265
+ "records": [
266
+ {
267
+ "name": "model.layers.11.mlp.down_proj.weight",
268
+ "shape": [
269
+ 2048,
270
+ 8192
271
+ ],
272
+ "dtype": "float16",
273
+ "format": "f32-to-bf16",
274
+ "nbytes": 33554432,
275
+ "byteOffset": 0
276
+ }
277
+ ],
278
+ "md5sum": "1264f1a44eb9ab68745d01ea53da5e0b"
279
+ },
280
+ {
281
+ "dataPath": "params_shard_10.bin",
282
+ "format": "raw-shard",
283
+ "nbytes": 67108864,
284
+ "records": [
285
+ {
286
+ "name": "model.layers.11.mlp.gate_up_proj.weight",
287
+ "shape": [
288
+ 16384,
289
+ 2048
290
+ ],
291
+ "dtype": "float16",
292
+ "format": "f32-to-bf16",
293
+ "nbytes": 67108864,
294
+ "byteOffset": 0
295
+ }
296
+ ],
297
+ "md5sum": "a19047b7973f1066042d0bedaac882b8"
298
+ },
299
+ {
300
+ "dataPath": "params_shard_11.bin",
301
+ "format": "raw-shard",
302
+ "nbytes": 20979712,
303
+ "records": [
304
+ {
305
+ "name": "model.layers.10.self_attn.qkv_proj.weight",
306
+ "shape": [
307
+ 3072,
308
+ 2048
309
+ ],
310
+ "dtype": "float16",
311
+ "format": "f32-to-bf16",
312
+ "nbytes": 12582912,
313
+ "byteOffset": 0
314
+ },
315
+ {
316
+ "name": "model.layers.10.self_attn.o_proj.weight",
317
+ "shape": [
318
+ 2048,
319
+ 2048
320
+ ],
321
+ "dtype": "float16",
322
+ "format": "f32-to-bf16",
323
+ "nbytes": 8388608,
324
+ "byteOffset": 12582912
325
+ },
326
+ {
327
+ "name": "model.layers.11.input_layernorm.weight",
328
+ "shape": [
329
+ 2048
330
+ ],
331
+ "dtype": "float16",
332
+ "format": "f32-to-bf16",
333
+ "nbytes": 4096,
334
+ "byteOffset": 20971520
335
+ },
336
+ {
337
+ "name": "model.layers.11.post_attention_layernorm.weight",
338
+ "shape": [
339
+ 2048
340
+ ],
341
+ "dtype": "float16",
342
+ "format": "f32-to-bf16",
343
+ "nbytes": 4096,
344
+ "byteOffset": 20975616
345
+ }
346
+ ],
347
+ "md5sum": "4e8e76bfd35611401d5396504f54c276"
348
+ },
349
+ {
350
+ "dataPath": "params_shard_12.bin",
351
+ "format": "raw-shard",
352
+ "nbytes": 33554432,
353
+ "records": [
354
+ {
355
+ "name": "model.layers.12.mlp.down_proj.weight",
356
+ "shape": [
357
+ 2048,
358
+ 8192
359
+ ],
360
+ "dtype": "float16",
361
+ "format": "f32-to-bf16",
362
+ "nbytes": 33554432,
363
+ "byteOffset": 0
364
+ }
365
+ ],
366
+ "md5sum": "ae0e5310659ab9716e61d03ec2114693"
367
+ },
368
+ {
369
+ "dataPath": "params_shard_13.bin",
370
+ "format": "raw-shard",
371
+ "nbytes": 67108864,
372
+ "records": [
373
+ {
374
+ "name": "model.layers.12.mlp.gate_up_proj.weight",
375
+ "shape": [
376
+ 16384,
377
+ 2048
378
+ ],
379
+ "dtype": "float16",
380
+ "format": "f32-to-bf16",
381
+ "nbytes": 67108864,
382
+ "byteOffset": 0
383
+ }
384
+ ],
385
+ "md5sum": "f6a2b966db0ff70ba3125c96a5e6f8e0"
386
+ },
387
+ {
388
+ "dataPath": "params_shard_14.bin",
389
+ "format": "raw-shard",
390
+ "nbytes": 20979712,
391
+ "records": [
392
+ {
393
+ "name": "model.layers.11.self_attn.qkv_proj.weight",
394
+ "shape": [
395
+ 3072,
396
+ 2048
397
+ ],
398
+ "dtype": "float16",
399
+ "format": "f32-to-bf16",
400
+ "nbytes": 12582912,
401
+ "byteOffset": 0
402
+ },
403
+ {
404
+ "name": "model.layers.11.self_attn.o_proj.weight",
405
+ "shape": [
406
+ 2048,
407
+ 2048
408
+ ],
409
+ "dtype": "float16",
410
+ "format": "f32-to-bf16",
411
+ "nbytes": 8388608,
412
+ "byteOffset": 12582912
413
+ },
414
+ {
415
+ "name": "model.layers.12.input_layernorm.weight",
416
+ "shape": [
417
+ 2048
418
+ ],
419
+ "dtype": "float16",
420
+ "format": "f32-to-bf16",
421
+ "nbytes": 4096,
422
+ "byteOffset": 20971520
423
+ },
424
+ {
425
+ "name": "model.layers.12.post_attention_layernorm.weight",
426
+ "shape": [
427
+ 2048
428
+ ],
429
+ "dtype": "float16",
430
+ "format": "f32-to-bf16",
431
+ "nbytes": 4096,
432
+ "byteOffset": 20975616
433
+ }
434
+ ],
435
+ "md5sum": "e8004ca46980e8515eab568c169618c4"
436
+ },
437
+ {
438
+ "dataPath": "params_shard_15.bin",
439
+ "format": "raw-shard",
440
+ "nbytes": 33554432,
441
+ "records": [
442
+ {
443
+ "name": "model.layers.13.mlp.down_proj.weight",
444
+ "shape": [
445
+ 2048,
446
+ 8192
447
+ ],
448
+ "dtype": "float16",
449
+ "format": "f32-to-bf16",
450
+ "nbytes": 33554432,
451
+ "byteOffset": 0
452
+ }
453
+ ],
454
+ "md5sum": "b06dbff200487e8963abfe2a18faa10c"
455
+ },
456
+ {
457
+ "dataPath": "params_shard_16.bin",
458
+ "format": "raw-shard",
459
+ "nbytes": 67108864,
460
+ "records": [
461
+ {
462
+ "name": "model.layers.13.mlp.gate_up_proj.weight",
463
+ "shape": [
464
+ 16384,
465
+ 2048
466
+ ],
467
+ "dtype": "float16",
468
+ "format": "f32-to-bf16",
469
+ "nbytes": 67108864,
470
+ "byteOffset": 0
471
+ }
472
+ ],
473
+ "md5sum": "bd87851fed4a03435b0a28d9f42447c9"
474
+ },
475
+ {
476
+ "dataPath": "params_shard_17.bin",
477
+ "format": "raw-shard",
478
+ "nbytes": 20979712,
479
+ "records": [
480
+ {
481
+ "name": "model.layers.12.self_attn.qkv_proj.weight",
482
+ "shape": [
483
+ 3072,
484
+ 2048
485
+ ],
486
+ "dtype": "float16",
487
+ "format": "f32-to-bf16",
488
+ "nbytes": 12582912,
489
+ "byteOffset": 0
490
+ },
491
+ {
492
+ "name": "model.layers.12.self_attn.o_proj.weight",
493
+ "shape": [
494
+ 2048,
495
+ 2048
496
+ ],
497
+ "dtype": "float16",
498
+ "format": "f32-to-bf16",
499
+ "nbytes": 8388608,
500
+ "byteOffset": 12582912
501
+ },
502
+ {
503
+ "name": "model.layers.13.input_layernorm.weight",
504
+ "shape": [
505
+ 2048
506
+ ],
507
+ "dtype": "float16",
508
+ "format": "f32-to-bf16",
509
+ "nbytes": 4096,
510
+ "byteOffset": 20971520
511
+ },
512
+ {
513
+ "name": "model.layers.13.post_attention_layernorm.weight",
514
+ "shape": [
515
+ 2048
516
+ ],
517
+ "dtype": "float16",
518
+ "format": "f32-to-bf16",
519
+ "nbytes": 4096,
520
+ "byteOffset": 20975616
521
+ }
522
+ ],
523
+ "md5sum": "27d40a71fc5d26371d8baa46c8d72b31"
524
+ },
525
+ {
526
+ "dataPath": "params_shard_18.bin",
527
+ "format": "raw-shard",
528
+ "nbytes": 33554432,
529
+ "records": [
530
+ {
531
+ "name": "model.layers.14.mlp.down_proj.weight",
532
+ "shape": [
533
+ 2048,
534
+ 8192
535
+ ],
536
+ "dtype": "float16",
537
+ "format": "f32-to-bf16",
538
+ "nbytes": 33554432,
539
+ "byteOffset": 0
540
+ }
541
+ ],
542
+ "md5sum": "83337b46f585fa676dcd5a8761ad3df0"
543
+ },
544
+ {
545
+ "dataPath": "params_shard_19.bin",
546
+ "format": "raw-shard",
547
+ "nbytes": 67108864,
548
+ "records": [
549
+ {
550
+ "name": "model.layers.14.mlp.gate_up_proj.weight",
551
+ "shape": [
552
+ 16384,
553
+ 2048
554
+ ],
555
+ "dtype": "float16",
556
+ "format": "f32-to-bf16",
557
+ "nbytes": 67108864,
558
+ "byteOffset": 0
559
+ }
560
+ ],
561
+ "md5sum": "5082712a00fe7140ef9c9215846b3d60"
562
+ },
563
+ {
564
+ "dataPath": "params_shard_20.bin",
565
+ "format": "raw-shard",
566
+ "nbytes": 20979712,
567
+ "records": [
568
+ {
569
+ "name": "model.layers.13.self_attn.qkv_proj.weight",
570
+ "shape": [
571
+ 3072,
572
+ 2048
573
+ ],
574
+ "dtype": "float16",
575
+ "format": "f32-to-bf16",
576
+ "nbytes": 12582912,
577
+ "byteOffset": 0
578
+ },
579
+ {
580
+ "name": "model.layers.13.self_attn.o_proj.weight",
581
+ "shape": [
582
+ 2048,
583
+ 2048
584
+ ],
585
+ "dtype": "float16",
586
+ "format": "f32-to-bf16",
587
+ "nbytes": 8388608,
588
+ "byteOffset": 12582912
589
+ },
590
+ {
591
+ "name": "model.layers.14.input_layernorm.weight",
592
+ "shape": [
593
+ 2048
594
+ ],
595
+ "dtype": "float16",
596
+ "format": "f32-to-bf16",
597
+ "nbytes": 4096,
598
+ "byteOffset": 20971520
599
+ },
600
+ {
601
+ "name": "model.layers.14.post_attention_layernorm.weight",
602
+ "shape": [
603
+ 2048
604
+ ],
605
+ "dtype": "float16",
606
+ "format": "f32-to-bf16",
607
+ "nbytes": 4096,
608
+ "byteOffset": 20975616
609
+ }
610
+ ],
611
+ "md5sum": "7f3978ef783e8e42895a2486316f9304"
612
+ },
613
+ {
614
+ "dataPath": "params_shard_21.bin",
615
+ "format": "raw-shard",
616
+ "nbytes": 33554432,
617
+ "records": [
618
+ {
619
+ "name": "model.layers.15.mlp.down_proj.weight",
620
+ "shape": [
621
+ 2048,
622
+ 8192
623
+ ],
624
+ "dtype": "float16",
625
+ "format": "f32-to-bf16",
626
+ "nbytes": 33554432,
627
+ "byteOffset": 0
628
+ }
629
+ ],
630
+ "md5sum": "622b5ce5905b898048f2eaf79caedfbd"
631
+ },
632
+ {
633
+ "dataPath": "params_shard_22.bin",
634
+ "format": "raw-shard",
635
+ "nbytes": 67108864,
636
+ "records": [
637
+ {
638
+ "name": "model.layers.15.mlp.gate_up_proj.weight",
639
+ "shape": [
640
+ 16384,
641
+ 2048
642
+ ],
643
+ "dtype": "float16",
644
+ "format": "f32-to-bf16",
645
+ "nbytes": 67108864,
646
+ "byteOffset": 0
647
+ }
648
+ ],
649
+ "md5sum": "ace31b45f71cf2c0bf9b1056d74e1fe1"
650
+ },
651
+ {
652
+ "dataPath": "params_shard_23.bin",
653
+ "format": "raw-shard",
654
+ "nbytes": 20979712,
655
+ "records": [
656
+ {
657
+ "name": "model.layers.14.self_attn.qkv_proj.weight",
658
+ "shape": [
659
+ 3072,
660
+ 2048
661
+ ],
662
+ "dtype": "float16",
663
+ "format": "f32-to-bf16",
664
+ "nbytes": 12582912,
665
+ "byteOffset": 0
666
+ },
667
+ {
668
+ "name": "model.layers.14.self_attn.o_proj.weight",
669
+ "shape": [
670
+ 2048,
671
+ 2048
672
+ ],
673
+ "dtype": "float16",
674
+ "format": "f32-to-bf16",
675
+ "nbytes": 8388608,
676
+ "byteOffset": 12582912
677
+ },
678
+ {
679
+ "name": "model.layers.15.input_layernorm.weight",
680
+ "shape": [
681
+ 2048
682
+ ],
683
+ "dtype": "float16",
684
+ "format": "f32-to-bf16",
685
+ "nbytes": 4096,
686
+ "byteOffset": 20971520
687
+ },
688
+ {
689
+ "name": "model.layers.15.post_attention_layernorm.weight",
690
+ "shape": [
691
+ 2048
692
+ ],
693
+ "dtype": "float16",
694
+ "format": "f32-to-bf16",
695
+ "nbytes": 4096,
696
+ "byteOffset": 20975616
697
+ }
698
+ ],
699
+ "md5sum": "2f7afc97c52867936a854ad44fe0c191"
700
+ },
701
+ {
702
+ "dataPath": "params_shard_24.bin",
703
+ "format": "raw-shard",
704
+ "nbytes": 33554432,
705
+ "records": [
706
+ {
707
+ "name": "model.layers.2.mlp.down_proj.weight",
708
+ "shape": [
709
+ 2048,
710
+ 8192
711
+ ],
712
+ "dtype": "float16",
713
+ "format": "f32-to-bf16",
714
+ "nbytes": 33554432,
715
+ "byteOffset": 0
716
+ }
717
+ ],
718
+ "md5sum": "494e89ca81bf466c8ce17b5ee5f85cdf"
719
+ },
720
+ {
721
+ "dataPath": "params_shard_25.bin",
722
+ "format": "raw-shard",
723
+ "nbytes": 67108864,
724
+ "records": [
725
+ {
726
+ "name": "model.layers.2.mlp.gate_up_proj.weight",
727
+ "shape": [
728
+ 16384,
729
+ 2048
730
+ ],
731
+ "dtype": "float16",
732
+ "format": "f32-to-bf16",
733
+ "nbytes": 67108864,
734
+ "byteOffset": 0
735
+ }
736
+ ],
737
+ "md5sum": "13c70878199744779a54933d728727f8"
738
+ },
739
+ {
740
+ "dataPath": "params_shard_26.bin",
741
+ "format": "raw-shard",
742
+ "nbytes": 20979712,
743
+ "records": [
744
+ {
745
+ "name": "model.layers.15.self_attn.qkv_proj.weight",
746
+ "shape": [
747
+ 3072,
748
+ 2048
749
+ ],
750
+ "dtype": "float16",
751
+ "format": "f32-to-bf16",
752
+ "nbytes": 12582912,
753
+ "byteOffset": 0
754
+ },
755
+ {
756
+ "name": "model.layers.15.self_attn.o_proj.weight",
757
+ "shape": [
758
+ 2048,
759
+ 2048
760
+ ],
761
+ "dtype": "float16",
762
+ "format": "f32-to-bf16",
763
+ "nbytes": 8388608,
764
+ "byteOffset": 12582912
765
+ },
766
+ {
767
+ "name": "model.layers.2.input_layernorm.weight",
768
+ "shape": [
769
+ 2048
770
+ ],
771
+ "dtype": "float16",
772
+ "format": "f32-to-bf16",
773
+ "nbytes": 4096,
774
+ "byteOffset": 20971520
775
+ },
776
+ {
777
+ "name": "model.layers.2.post_attention_layernorm.weight",
778
+ "shape": [
779
+ 2048
780
+ ],
781
+ "dtype": "float16",
782
+ "format": "f32-to-bf16",
783
+ "nbytes": 4096,
784
+ "byteOffset": 20975616
785
+ }
786
+ ],
787
+ "md5sum": "f6e5b110dc063026eb64f03b3fa20bee"
788
+ },
789
+ {
790
+ "dataPath": "params_shard_27.bin",
791
+ "format": "raw-shard",
792
+ "nbytes": 33554432,
793
+ "records": [
794
+ {
795
+ "name": "model.layers.3.mlp.down_proj.weight",
796
+ "shape": [
797
+ 2048,
798
+ 8192
799
+ ],
800
+ "dtype": "float16",
801
+ "format": "f32-to-bf16",
802
+ "nbytes": 33554432,
803
+ "byteOffset": 0
804
+ }
805
+ ],
806
+ "md5sum": "910d39d3f068b37b2df084f9762571b4"
807
+ },
808
+ {
809
+ "dataPath": "params_shard_28.bin",
810
+ "format": "raw-shard",
811
+ "nbytes": 67108864,
812
+ "records": [
813
+ {
814
+ "name": "model.layers.3.mlp.gate_up_proj.weight",
815
+ "shape": [
816
+ 16384,
817
+ 2048
818
+ ],
819
+ "dtype": "float16",
820
+ "format": "f32-to-bf16",
821
+ "nbytes": 67108864,
822
+ "byteOffset": 0
823
+ }
824
+ ],
825
+ "md5sum": "49f61ec31d545e7569fb2a44a2b7b07f"
826
+ },
827
+ {
828
+ "dataPath": "params_shard_29.bin",
829
+ "format": "raw-shard",
830
+ "nbytes": 20979712,
831
+ "records": [
832
+ {
833
+ "name": "model.layers.2.self_attn.qkv_proj.weight",
834
+ "shape": [
835
+ 3072,
836
+ 2048
837
+ ],
838
+ "dtype": "float16",
839
+ "format": "f32-to-bf16",
840
+ "nbytes": 12582912,
841
+ "byteOffset": 0
842
+ },
843
+ {
844
+ "name": "model.layers.2.self_attn.o_proj.weight",
845
+ "shape": [
846
+ 2048,
847
+ 2048
848
+ ],
849
+ "dtype": "float16",
850
+ "format": "f32-to-bf16",
851
+ "nbytes": 8388608,
852
+ "byteOffset": 12582912
853
+ },
854
+ {
855
+ "name": "model.layers.3.input_layernorm.weight",
856
+ "shape": [
857
+ 2048
858
+ ],
859
+ "dtype": "float16",
860
+ "format": "f32-to-bf16",
861
+ "nbytes": 4096,
862
+ "byteOffset": 20971520
863
+ },
864
+ {
865
+ "name": "model.layers.3.post_attention_layernorm.weight",
866
+ "shape": [
867
+ 2048
868
+ ],
869
+ "dtype": "float16",
870
+ "format": "f32-to-bf16",
871
+ "nbytes": 4096,
872
+ "byteOffset": 20975616
873
+ }
874
+ ],
875
+ "md5sum": "1eb0bdc450a0fca3053b47dc24368565"
876
+ },
877
+ {
878
+ "dataPath": "params_shard_30.bin",
879
+ "format": "raw-shard",
880
+ "nbytes": 33554432,
881
+ "records": [
882
+ {
883
+ "name": "model.layers.4.mlp.down_proj.weight",
884
+ "shape": [
885
+ 2048,
886
+ 8192
887
+ ],
888
+ "dtype": "float16",
889
+ "format": "f32-to-bf16",
890
+ "nbytes": 33554432,
891
+ "byteOffset": 0
892
+ }
893
+ ],
894
+ "md5sum": "f5b3c59b6ab43fe2f4892ad53d246905"
895
+ },
896
+ {
897
+ "dataPath": "params_shard_31.bin",
898
+ "format": "raw-shard",
899
+ "nbytes": 67108864,
900
+ "records": [
901
+ {
902
+ "name": "model.layers.4.mlp.gate_up_proj.weight",
903
+ "shape": [
904
+ 16384,
905
+ 2048
906
+ ],
907
+ "dtype": "float16",
908
+ "format": "f32-to-bf16",
909
+ "nbytes": 67108864,
910
+ "byteOffset": 0
911
+ }
912
+ ],
913
+ "md5sum": "38d72936eaf8dc4bacf812d8fdb2d551"
914
+ },
915
+ {
916
+ "dataPath": "params_shard_32.bin",
917
+ "format": "raw-shard",
918
+ "nbytes": 20979712,
919
+ "records": [
920
+ {
921
+ "name": "model.layers.3.self_attn.qkv_proj.weight",
922
+ "shape": [
923
+ 3072,
924
+ 2048
925
+ ],
926
+ "dtype": "float16",
927
+ "format": "f32-to-bf16",
928
+ "nbytes": 12582912,
929
+ "byteOffset": 0
930
+ },
931
+ {
932
+ "name": "model.layers.3.self_attn.o_proj.weight",
933
+ "shape": [
934
+ 2048,
935
+ 2048
936
+ ],
937
+ "dtype": "float16",
938
+ "format": "f32-to-bf16",
939
+ "nbytes": 8388608,
940
+ "byteOffset": 12582912
941
+ },
942
+ {
943
+ "name": "model.layers.4.input_layernorm.weight",
944
+ "shape": [
945
+ 2048
946
+ ],
947
+ "dtype": "float16",
948
+ "format": "f32-to-bf16",
949
+ "nbytes": 4096,
950
+ "byteOffset": 20971520
951
+ },
952
+ {
953
+ "name": "model.layers.4.post_attention_layernorm.weight",
954
+ "shape": [
955
+ 2048
956
+ ],
957
+ "dtype": "float16",
958
+ "format": "f32-to-bf16",
959
+ "nbytes": 4096,
960
+ "byteOffset": 20975616
961
+ }
962
+ ],
963
+ "md5sum": "773e500fd6e2ad9b4b9306b1e94af630"
964
+ },
965
+ {
966
+ "dataPath": "params_shard_33.bin",
967
+ "format": "raw-shard",
968
+ "nbytes": 33554432,
969
+ "records": [
970
+ {
971
+ "name": "model.layers.5.mlp.down_proj.weight",
972
+ "shape": [
973
+ 2048,
974
+ 8192
975
+ ],
976
+ "dtype": "float16",
977
+ "format": "f32-to-bf16",
978
+ "nbytes": 33554432,
979
+ "byteOffset": 0
980
+ }
981
+ ],
982
+ "md5sum": "90826d2d305b088531c014dd0ce66ec6"
983
+ },
984
+ {
985
+ "dataPath": "params_shard_34.bin",
986
+ "format": "raw-shard",
987
+ "nbytes": 67108864,
988
+ "records": [
989
+ {
990
+ "name": "model.layers.5.mlp.gate_up_proj.weight",
991
+ "shape": [
992
+ 16384,
993
+ 2048
994
+ ],
995
+ "dtype": "float16",
996
+ "format": "f32-to-bf16",
997
+ "nbytes": 67108864,
998
+ "byteOffset": 0
999
+ }
1000
+ ],
1001
+ "md5sum": "4a44e23743fed303acca7cd7d910a8ae"
1002
+ },
1003
+ {
1004
+ "dataPath": "params_shard_35.bin",
1005
+ "format": "raw-shard",
1006
+ "nbytes": 20979712,
1007
+ "records": [
1008
+ {
1009
+ "name": "model.layers.4.self_attn.qkv_proj.weight",
1010
+ "shape": [
1011
+ 3072,
1012
+ 2048
1013
+ ],
1014
+ "dtype": "float16",
1015
+ "format": "f32-to-bf16",
1016
+ "nbytes": 12582912,
1017
+ "byteOffset": 0
1018
+ },
1019
+ {
1020
+ "name": "model.layers.4.self_attn.o_proj.weight",
1021
+ "shape": [
1022
+ 2048,
1023
+ 2048
1024
+ ],
1025
+ "dtype": "float16",
1026
+ "format": "f32-to-bf16",
1027
+ "nbytes": 8388608,
1028
+ "byteOffset": 12582912
1029
+ },
1030
+ {
1031
+ "name": "model.layers.5.input_layernorm.weight",
1032
+ "shape": [
1033
+ 2048
1034
+ ],
1035
+ "dtype": "float16",
1036
+ "format": "f32-to-bf16",
1037
+ "nbytes": 4096,
1038
+ "byteOffset": 20971520
1039
+ },
1040
+ {
1041
+ "name": "model.layers.5.post_attention_layernorm.weight",
1042
+ "shape": [
1043
+ 2048
1044
+ ],
1045
+ "dtype": "float16",
1046
+ "format": "f32-to-bf16",
1047
+ "nbytes": 4096,
1048
+ "byteOffset": 20975616
1049
+ }
1050
+ ],
1051
+ "md5sum": "06a369e56d5da237caaa2b9cd972be08"
1052
+ },
1053
+ {
1054
+ "dataPath": "params_shard_36.bin",
1055
+ "format": "raw-shard",
1056
+ "nbytes": 33554432,
1057
+ "records": [
1058
+ {
1059
+ "name": "model.layers.6.mlp.down_proj.weight",
1060
+ "shape": [
1061
+ 2048,
1062
+ 8192
1063
+ ],
1064
+ "dtype": "float16",
1065
+ "format": "f32-to-bf16",
1066
+ "nbytes": 33554432,
1067
+ "byteOffset": 0
1068
+ }
1069
+ ],
1070
+ "md5sum": "8ba19ff095dc05dfcea429ff62adc5d7"
1071
+ },
1072
+ {
1073
+ "dataPath": "params_shard_37.bin",
1074
+ "format": "raw-shard",
1075
+ "nbytes": 67108864,
1076
+ "records": [
1077
+ {
1078
+ "name": "model.layers.6.mlp.gate_up_proj.weight",
1079
+ "shape": [
1080
+ 16384,
1081
+ 2048
1082
+ ],
1083
+ "dtype": "float16",
1084
+ "format": "f32-to-bf16",
1085
+ "nbytes": 67108864,
1086
+ "byteOffset": 0
1087
+ }
1088
+ ],
1089
+ "md5sum": "3fbdebbc72ec4e5529e3e90a96d9ddec"
1090
+ },
1091
+ {
1092
+ "dataPath": "params_shard_38.bin",
1093
+ "format": "raw-shard",
1094
+ "nbytes": 20979712,
1095
+ "records": [
1096
+ {
1097
+ "name": "model.layers.5.self_attn.qkv_proj.weight",
1098
+ "shape": [
1099
+ 3072,
1100
+ 2048
1101
+ ],
1102
+ "dtype": "float16",
1103
+ "format": "f32-to-bf16",
1104
+ "nbytes": 12582912,
1105
+ "byteOffset": 0
1106
+ },
1107
+ {
1108
+ "name": "model.layers.5.self_attn.o_proj.weight",
1109
+ "shape": [
1110
+ 2048,
1111
+ 2048
1112
+ ],
1113
+ "dtype": "float16",
1114
+ "format": "f32-to-bf16",
1115
+ "nbytes": 8388608,
1116
+ "byteOffset": 12582912
1117
+ },
1118
+ {
1119
+ "name": "model.layers.6.input_layernorm.weight",
1120
+ "shape": [
1121
+ 2048
1122
+ ],
1123
+ "dtype": "float16",
1124
+ "format": "f32-to-bf16",
1125
+ "nbytes": 4096,
1126
+ "byteOffset": 20971520
1127
+ },
1128
+ {
1129
+ "name": "model.layers.6.post_attention_layernorm.weight",
1130
+ "shape": [
1131
+ 2048
1132
+ ],
1133
+ "dtype": "float16",
1134
+ "format": "f32-to-bf16",
1135
+ "nbytes": 4096,
1136
+ "byteOffset": 20975616
1137
+ }
1138
+ ],
1139
+ "md5sum": "e0c825ac82f14adb91b255cec234cd72"
1140
+ },
1141
+ {
1142
+ "dataPath": "params_shard_39.bin",
1143
+ "format": "raw-shard",
1144
+ "nbytes": 33554432,
1145
+ "records": [
1146
+ {
1147
+ "name": "model.layers.7.mlp.down_proj.weight",
1148
+ "shape": [
1149
+ 2048,
1150
+ 8192
1151
+ ],
1152
+ "dtype": "float16",
1153
+ "format": "f32-to-bf16",
1154
+ "nbytes": 33554432,
1155
+ "byteOffset": 0
1156
+ }
1157
+ ],
1158
+ "md5sum": "fef5beab293cc482e63947c7aa133778"
1159
+ },
1160
+ {
1161
+ "dataPath": "params_shard_40.bin",
1162
+ "format": "raw-shard",
1163
+ "nbytes": 67108864,
1164
+ "records": [
1165
+ {
1166
+ "name": "model.layers.7.mlp.gate_up_proj.weight",
1167
+ "shape": [
1168
+ 16384,
1169
+ 2048
1170
+ ],
1171
+ "dtype": "float16",
1172
+ "format": "f32-to-bf16",
1173
+ "nbytes": 67108864,
1174
+ "byteOffset": 0
1175
+ }
1176
+ ],
1177
+ "md5sum": "f219b3aeb641094c9506d63d34f61594"
1178
+ },
1179
+ {
1180
+ "dataPath": "params_shard_41.bin",
1181
+ "format": "raw-shard",
1182
+ "nbytes": 20979712,
1183
+ "records": [
1184
+ {
1185
+ "name": "model.layers.6.self_attn.qkv_proj.weight",
1186
+ "shape": [
1187
+ 3072,
1188
+ 2048
1189
+ ],
1190
+ "dtype": "float16",
1191
+ "format": "f32-to-bf16",
1192
+ "nbytes": 12582912,
1193
+ "byteOffset": 0
1194
+ },
1195
+ {
1196
+ "name": "model.layers.6.self_attn.o_proj.weight",
1197
+ "shape": [
1198
+ 2048,
1199
+ 2048
1200
+ ],
1201
+ "dtype": "float16",
1202
+ "format": "f32-to-bf16",
1203
+ "nbytes": 8388608,
1204
+ "byteOffset": 12582912
1205
+ },
1206
+ {
1207
+ "name": "model.layers.7.input_layernorm.weight",
1208
+ "shape": [
1209
+ 2048
1210
+ ],
1211
+ "dtype": "float16",
1212
+ "format": "f32-to-bf16",
1213
+ "nbytes": 4096,
1214
+ "byteOffset": 20971520
1215
+ },
1216
+ {
1217
+ "name": "model.layers.7.post_attention_layernorm.weight",
1218
+ "shape": [
1219
+ 2048
1220
+ ],
1221
+ "dtype": "float16",
1222
+ "format": "f32-to-bf16",
1223
+ "nbytes": 4096,
1224
+ "byteOffset": 20975616
1225
+ }
1226
+ ],
1227
+ "md5sum": "afe26b0283ed633feaa3784374d866b3"
1228
+ },
1229
+ {
1230
+ "dataPath": "params_shard_42.bin",
1231
+ "format": "raw-shard",
1232
+ "nbytes": 33554432,
1233
+ "records": [
1234
+ {
1235
+ "name": "model.layers.8.mlp.down_proj.weight",
1236
+ "shape": [
1237
+ 2048,
1238
+ 8192
1239
+ ],
1240
+ "dtype": "float16",
1241
+ "format": "f32-to-bf16",
1242
+ "nbytes": 33554432,
1243
+ "byteOffset": 0
1244
+ }
1245
+ ],
1246
+ "md5sum": "4f0cd14ab865c2f8553f79466e85a6ab"
1247
+ },
1248
+ {
1249
+ "dataPath": "params_shard_43.bin",
1250
+ "format": "raw-shard",
1251
+ "nbytes": 67108864,
1252
+ "records": [
1253
+ {
1254
+ "name": "model.layers.8.mlp.gate_up_proj.weight",
1255
+ "shape": [
1256
+ 16384,
1257
+ 2048
1258
+ ],
1259
+ "dtype": "float16",
1260
+ "format": "f32-to-bf16",
1261
+ "nbytes": 67108864,
1262
+ "byteOffset": 0
1263
+ }
1264
+ ],
1265
+ "md5sum": "e47e8214362118313253663b2c3ce43e"
1266
+ },
1267
+ {
1268
+ "dataPath": "params_shard_44.bin",
1269
+ "format": "raw-shard",
1270
+ "nbytes": 20979712,
1271
+ "records": [
1272
+ {
1273
+ "name": "model.layers.7.self_attn.qkv_proj.weight",
1274
+ "shape": [
1275
+ 3072,
1276
+ 2048
1277
+ ],
1278
+ "dtype": "float16",
1279
+ "format": "f32-to-bf16",
1280
+ "nbytes": 12582912,
1281
+ "byteOffset": 0
1282
+ },
1283
+ {
1284
+ "name": "model.layers.7.self_attn.o_proj.weight",
1285
+ "shape": [
1286
+ 2048,
1287
+ 2048
1288
+ ],
1289
+ "dtype": "float16",
1290
+ "format": "f32-to-bf16",
1291
+ "nbytes": 8388608,
1292
+ "byteOffset": 12582912
1293
+ },
1294
+ {
1295
+ "name": "model.layers.8.input_layernorm.weight",
1296
+ "shape": [
1297
+ 2048
1298
+ ],
1299
+ "dtype": "float16",
1300
+ "format": "f32-to-bf16",
1301
+ "nbytes": 4096,
1302
+ "byteOffset": 20971520
1303
+ },
1304
+ {
1305
+ "name": "model.layers.8.post_attention_layernorm.weight",
1306
+ "shape": [
1307
+ 2048
1308
+ ],
1309
+ "dtype": "float16",
1310
+ "format": "f32-to-bf16",
1311
+ "nbytes": 4096,
1312
+ "byteOffset": 20975616
1313
+ }
1314
+ ],
1315
+ "md5sum": "8bdcc2a37898c5b61dc2e9d30f7ffcc2"
1316
+ },
1317
+ {
1318
+ "dataPath": "params_shard_45.bin",
1319
+ "format": "raw-shard",
1320
+ "nbytes": 33554432,
1321
+ "records": [
1322
+ {
1323
+ "name": "model.layers.9.mlp.down_proj.weight",
1324
+ "shape": [
1325
+ 2048,
1326
+ 8192
1327
+ ],
1328
+ "dtype": "float16",
1329
+ "format": "f32-to-bf16",
1330
+ "nbytes": 33554432,
1331
+ "byteOffset": 0
1332
+ }
1333
+ ],
1334
+ "md5sum": "32e3d2650ce2fcb5ba7e8ebbcf7f8b2c"
1335
+ },
1336
+ {
1337
+ "dataPath": "params_shard_46.bin",
1338
+ "format": "raw-shard",
1339
+ "nbytes": 67108864,
1340
+ "records": [
1341
+ {
1342
+ "name": "model.layers.9.mlp.gate_up_proj.weight",
1343
+ "shape": [
1344
+ 16384,
1345
+ 2048
1346
+ ],
1347
+ "dtype": "float16",
1348
+ "format": "f32-to-bf16",
1349
+ "nbytes": 67108864,
1350
+ "byteOffset": 0
1351
+ }
1352
+ ],
1353
+ "md5sum": "fe1013105c419278c23b765e981940fb"
1354
+ },
1355
+ {
1356
+ "dataPath": "params_shard_47.bin",
1357
+ "format": "raw-shard",
1358
+ "nbytes": 20979712,
1359
+ "records": [
1360
+ {
1361
+ "name": "model.layers.8.self_attn.qkv_proj.weight",
1362
+ "shape": [
1363
+ 3072,
1364
+ 2048
1365
+ ],
1366
+ "dtype": "float16",
1367
+ "format": "f32-to-bf16",
1368
+ "nbytes": 12582912,
1369
+ "byteOffset": 0
1370
+ },
1371
+ {
1372
+ "name": "model.layers.8.self_attn.o_proj.weight",
1373
+ "shape": [
1374
+ 2048,
1375
+ 2048
1376
+ ],
1377
+ "dtype": "float16",
1378
+ "format": "f32-to-bf16",
1379
+ "nbytes": 8388608,
1380
+ "byteOffset": 12582912
1381
+ },
1382
+ {
1383
+ "name": "model.layers.9.input_layernorm.weight",
1384
+ "shape": [
1385
+ 2048
1386
+ ],
1387
+ "dtype": "float16",
1388
+ "format": "f32-to-bf16",
1389
+ "nbytes": 4096,
1390
+ "byteOffset": 20971520
1391
+ },
1392
+ {
1393
+ "name": "model.layers.9.post_attention_layernorm.weight",
1394
+ "shape": [
1395
+ 2048
1396
+ ],
1397
+ "dtype": "float16",
1398
+ "format": "f32-to-bf16",
1399
+ "nbytes": 4096,
1400
+ "byteOffset": 20975616
1401
+ }
1402
+ ],
1403
+ "md5sum": "9a3e24a308a77520ec39767e540973be"
1404
+ },
1405
+ {
1406
+ "dataPath": "params_shard_48.bin",
1407
+ "format": "raw-shard",
1408
+ "nbytes": 20975616,
1409
+ "records": [
1410
+ {
1411
+ "name": "model.layers.9.self_attn.qkv_proj.weight",
1412
+ "shape": [
1413
+ 3072,
1414
+ 2048
1415
+ ],
1416
+ "dtype": "float16",
1417
+ "format": "f32-to-bf16",
1418
+ "nbytes": 12582912,
1419
+ "byteOffset": 0
1420
+ },
1421
+ {
1422
+ "name": "model.layers.9.self_attn.o_proj.weight",
1423
+ "shape": [
1424
+ 2048,
1425
+ 2048
1426
+ ],
1427
+ "dtype": "float16",
1428
+ "format": "f32-to-bf16",
1429
+ "nbytes": 8388608,
1430
+ "byteOffset": 12582912
1431
+ },
1432
+ {
1433
+ "name": "model.norm.weight",
1434
+ "shape": [
1435
+ 2048
1436
+ ],
1437
+ "dtype": "float16",
1438
+ "format": "f32-to-bf16",
1439
+ "nbytes": 4096,
1440
+ "byteOffset": 20971520
1441
+ }
1442
+ ],
1443
+ "md5sum": "4d899f886c5eb1df29d31ae4897fae61"
1444
+ }
1445
+ ]
1446
+ }
params_shard_0.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0f3113fe3a8fbef23e4826a385ea2999bda2030b3cc28c35fd2c306ac9cc2449
3
+ size 525336576
params_shard_1.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f56005f381e8cef2662d82dff751251075b99c4ac46c86863146af87c576422
3
+ size 33554432
params_shard_10.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3ac19137d514f4a732adb00fb43008bbbbfa441b63d7ff2766da0b6cbe65a2f
3
+ size 67108864
params_shard_11.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:435482b808a886c8f3f955afd757759c37f0a4c2d0fdde3a1ab43c34c8ce5489
3
+ size 20979712
params_shard_12.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18f838de8bbbe7ad00ec6f2c22c9364673a073ca469fa3b4971a49b3d8c7d163
3
+ size 33554432
params_shard_13.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c0eca514480039a369bdf506b8b632a38617e92352480d89d0f2f62c63a306b4
3
+ size 67108864
params_shard_14.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b04d29897cf2fa644fdc8eb283d32106e1516c443dd716bd5d9d492cd3e11f1d
3
+ size 20979712
params_shard_15.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f99d33369538ec4d80c80d12b6071c6a764b16648ca9b8b1f78326db3803e335
3
+ size 33554432
params_shard_16.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a5d9c86ec72a6ca86bde1384b65c628f887c6d9f43c1238cf9836d8bfa8026c
3
+ size 67108864
params_shard_17.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:921ddf7c4296596db813d416309a31e09e2db5f16c415ae3ab2f75b22ec7b398
3
+ size 20979712
params_shard_18.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f69d7d52de1fda71b005573fa9aca62e5aa508a7fc252e16e007ed4a6a7afd6c
3
+ size 33554432
params_shard_19.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c5df0ce90af1a64574308d53e762216aa96d81f7b6a65c523259c5f809bc5e3b
3
+ size 67108864
params_shard_2.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5eeabe8be66a0045f6464320d1e03abfad4c50a1eda42fc3d3f11213cfddb355
3
+ size 67108864
params_shard_20.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93b0a8e67237749c434d85ba8645a87be15260f04979aa5a0dea478b77b5e362
3
+ size 20979712
params_shard_21.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f24bc4463dce1dde44210b2051b2bece9725249a24561fec255eab384df1e4c
3
+ size 33554432
params_shard_22.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e508ee1d4aacac7b6775082f2f3d400bb7605f0f63c0b7ccdab140f1852e6d37
3
+ size 67108864
params_shard_23.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65f033520f834fa0c806792bf3912bb14b1c14371a3a206333bf541e856f24e9
3
+ size 20979712
params_shard_24.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:179f277b6abaf8e3dfe27af0f364fba256ebb213d80b703fc6a3ddfd7bd27e17
3
+ size 33554432
params_shard_25.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69b6a702a511d418db85398656e489573b93e59fd87524acd3de8afcdb0ae579
3
+ size 67108864
params_shard_26.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e65cfd062c7218461f47cd033acaa1808a19106d70edab44c6d06a77e00ae912
3
+ size 20979712
params_shard_27.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d2b1c9dfc2641d129c31b70c3e52d230e1742cd4a6f1656b0e63208aa7b73efd
3
+ size 33554432
params_shard_28.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c188885902752da5a1d4293dbd3d4159318e444f69e99108b93cc4fe160be906
3
+ size 67108864
params_shard_29.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce74a46a16620307eab87be44d9494a08219a903022e8a10309e4522edc1b932
3
+ size 20979712
params_shard_3.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a4c8130c447c2b28130c07b71287eeb890565adfa412f2098c65c4f6fbdc148
3
+ size 33554432
params_shard_30.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc7b92ee5b216f65643bc364a5a7ac0ff0ed367ea1d753d364e21fe593fc9478
3
+ size 33554432
params_shard_31.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3790f89594d0fa6e52055d031171107e40b84a9f059a6750eea73d781011cdaf
3
+ size 67108864
params_shard_32.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:048f0a1ca7faeb233c3bcaffd816e0767f15ff7019205090c63e5cdbd69b8d4a
3
+ size 20979712
params_shard_33.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8cbe8047f97a16dca29d579949b756ba4ae8c2b25dc77e2d099fa2e530294c0
3
+ size 33554432
params_shard_34.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69168891dce13f765c87f92d8a15b68c7d81ae22208b4e2676921f087921f2b9
3
+ size 67108864
params_shard_35.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:27163ec0b7fe0693bc8f2c92ad81b5e44773d351f8e960149a3de9ef8b0c41e2
3
+ size 20979712
params_shard_36.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f1ab1bcaab1dc0b27ef23f558b63f3174eef383316315432cd664ac465582e7
3
+ size 33554432
params_shard_37.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b44cb323beac914f35110187fac5872d51d8a52530347dd92aef10610ab7c0b6
3
+ size 67108864
params_shard_38.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1ba25e79a55832e650ac34629c395cab21795862ff539bcbe129be0ebf2a269c
3
+ size 20979712
params_shard_39.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3b61f8327067b72e90c5e4a3c3197404e1289b7fac1c6ee560c830cbfefc04a
3
+ size 33554432
params_shard_4.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:73e335eb68f67d56cac0415bc2af496cc0380b8ac345ead638496feda766a945
3
+ size 67108864
params_shard_40.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70edb094aa5be38ef6e1b6f9d54f88613f5317137b7dcb4d4c244a30e02bacfc
3
+ size 67108864
params_shard_41.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d4b1d992553d4645eeeb85427d3e590b64394b304c44520e82ac8b442f012d48
3
+ size 20979712
params_shard_42.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7585419c0a26267b9f0b1b2448a332eace9591d97f30d7a9d86dc04b6c029909
3
+ size 33554432
params_shard_43.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa1c997bf3c4b0ad64e869f18fa67516b63523ba35fdfed10b09621382cf9333
3
+ size 67108864
params_shard_44.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae25780d2ff035df814979efb395797aa70adceb702c0c828cc6a760b29fd50c
3
+ size 20979712
params_shard_45.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bf50f3068d1d761d2e9846528377d582a20a1b4ed2ddae40cd211bd6fc61c1f
3
+ size 33554432
params_shard_46.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97d894d666c36538c4dee92716f06bfd7e811f2849facc769a9eb4dee3d9c475
3
+ size 67108864
params_shard_47.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4498b980d02ab0cd60dbaa4abe94c2408fe0ef4ab880b75e6305f4b31255af44
3
+ size 20979712
params_shard_48.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1921951ae23775a0ce2c182325aa1a843742111267ed80c0cb07e8fcfa5977b4
3
+ size 20975616
params_shard_5.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4358114b38faa979a97b4046cfbfd23af716f0726acd5958e16fdc51db87676
3
+ size 20987904
params_shard_6.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86df0e7f469c1090b4bb0da57a8e052b3e15e2e5308ce07f86692ef44e211fdb
3
+ size 33554432