nm-research commited on
Commit
43504ea
1 Parent(s): fb0c9da

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. README.md +477 -0
  2. config.json +79 -0
  3. generation_config.json +12 -0
  4. model-00001-of-00086.safetensors +3 -0
  5. model-00002-of-00086.safetensors +3 -0
  6. model-00003-of-00086.safetensors +3 -0
  7. model-00004-of-00086.safetensors +3 -0
  8. model-00005-of-00086.safetensors +3 -0
  9. model-00006-of-00086.safetensors +3 -0
  10. model-00007-of-00086.safetensors +3 -0
  11. model-00008-of-00086.safetensors +3 -0
  12. model-00009-of-00086.safetensors +3 -0
  13. model-00010-of-00086.safetensors +3 -0
  14. model-00011-of-00086.safetensors +3 -0
  15. model-00012-of-00086.safetensors +3 -0
  16. model-00013-of-00086.safetensors +3 -0
  17. model-00014-of-00086.safetensors +3 -0
  18. model-00015-of-00086.safetensors +3 -0
  19. model-00016-of-00086.safetensors +3 -0
  20. model-00017-of-00086.safetensors +3 -0
  21. model-00018-of-00086.safetensors +3 -0
  22. model-00019-of-00086.safetensors +3 -0
  23. model-00020-of-00086.safetensors +3 -0
  24. model-00021-of-00086.safetensors +3 -0
  25. model-00022-of-00086.safetensors +3 -0
  26. model-00023-of-00086.safetensors +3 -0
  27. model-00024-of-00086.safetensors +3 -0
  28. model-00025-of-00086.safetensors +3 -0
  29. model-00026-of-00086.safetensors +3 -0
  30. model-00027-of-00086.safetensors +3 -0
  31. model-00028-of-00086.safetensors +3 -0
  32. model-00029-of-00086.safetensors +3 -0
  33. model-00030-of-00086.safetensors +3 -0
  34. model-00031-of-00086.safetensors +3 -0
  35. model-00032-of-00086.safetensors +3 -0
  36. model-00033-of-00086.safetensors +3 -0
  37. model-00034-of-00086.safetensors +3 -0
  38. model-00035-of-00086.safetensors +3 -0
  39. model-00036-of-00086.safetensors +3 -0
  40. model-00037-of-00086.safetensors +3 -0
  41. model-00038-of-00086.safetensors +3 -0
  42. model-00039-of-00086.safetensors +3 -0
  43. model-00040-of-00086.safetensors +3 -0
  44. model-00041-of-00086.safetensors +3 -0
  45. model-00042-of-00086.safetensors +3 -0
  46. model-00043-of-00086.safetensors +3 -0
  47. model-00044-of-00086.safetensors +3 -0
  48. model-00045-of-00086.safetensors +3 -0
  49. model-00046-of-00086.safetensors +3 -0
  50. model-00047-of-00086.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,477 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - fp8
4
+ - vllm
5
+ language:
6
+ - en
7
+ - de
8
+ - fr
9
+ - it
10
+ - pt
11
+ - hi
12
+ - es
13
+ - th
14
+ pipeline_tag: text-generation
15
+ license: llama3.1
16
+ base_model: meta-llama/Meta-Llama-3.1-405B-Instruct
17
+ ---
18
+
19
+ # Meta-Llama-3.1-405B-Instruct-FP8-dynamic
20
+
21
+ ## Model Overview
22
+ - **Model Architecture:** Meta-Llama-3.1
23
+ - **Input:** Text
24
+ - **Output:** Text
25
+ - **Model Optimizations:**
26
+ - **Weight quantization:** FP8
27
+ - **Activation quantization:** FP8
28
+ - **Intended Use Cases:** Intended for commercial and research use in multiple languages. Similarly to [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct), this models is intended for assistant-like chat.
29
+ - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
30
+ - **Release Date:** 8/22/2024
31
+ - **Version:** 1.1
32
+ - **License(s):** [llama3.1](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE)
33
+ - **Model Developers:** Neural Magic
34
+
35
+ This model is a quantized version of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct).
36
+ It was evaluated on a several tasks to assess the its quality in comparison to the unquatized model, including multiple-choice, math reasoning, and open-ended text generation.
37
+ Meta-Llama-3.1-405B-Instruct-FP8-dynamic achieves 99.0% recovery for the Arena-Hard evaluation, 100.0% for OpenLLM v1 (using Meta's prompting when available), 99.9% for OpenLLM v2, 100.2% for HumanEval pass@1, and 101.1% for HumanEval+ pass@1.
38
+
39
+ ### Model Optimizations
40
+
41
+ This model was obtained by quantizing the weights and activations of [Meta-Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct) to FP8 data type, ready for inference with vLLM built from source.
42
+ This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%. In particular, this model can now be loaded and evaluated with a single node of 8xH100 GPUs, as opposed to multiple nodes.
43
+
44
+ Only the weights and activations of the linear operators within transformers blocks are quantized. Symmetric per-channel quantization is applied, in which a linear scaling per output dimension maps the FP8 representations of the quantized weights and activations. Activations are also quantized on a per-token dynamic basis.
45
+ [LLM Compressor](https://github.com/vllm-project/llm-compressor) is used for quantization.
46
+
47
+ ## Deployment
48
+
49
+ ### Use with vLLM
50
+
51
+ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend, as shown in the example below.
52
+
53
+ ```python
54
+ from vllm import LLM, SamplingParams
55
+ from transformers import AutoTokenizer
56
+
57
+ model_id = "neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic"
58
+ number_gpus = 8
59
+
60
+ sampling_params = SamplingParams(temperature=0.6, top_p=0.9, max_tokens=256)
61
+
62
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
63
+
64
+ messages = [
65
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
66
+ {"role": "user", "content": "Who are you?"},
67
+ ]
68
+
69
+ prompts = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
70
+
71
+ llm = LLM(model=model_id, tensor_parallel_size=number_gpus, max_model_len=4096)
72
+
73
+ outputs = llm.generate(prompts, sampling_params)
74
+
75
+ generated_text = outputs[0].outputs[0].text
76
+ print(generated_text)
77
+ ```
78
+
79
+ vLLM aslo supports OpenAI-compatible serving. See the [documentation](https://docs.vllm.ai/en/latest/) for more details.
80
+
81
+ ## Creation
82
+
83
+ This model was created by applying [LLM Compressor with calibration samples from UltraChat](https://github.com/vllm-project/llm-compressor/blob/sa/big_model_support/examples/big_model_offloading/big_model_w8a8_calibrate.py), as presented in the code snipet below.
84
+
85
+ ```python
86
+ import torch
87
+
88
+ from transformers import AutoTokenizer
89
+
90
+ from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
91
+ from llmcompressor.transformers.compression.helpers import ( # noqa
92
+ calculate_offload_device_map,
93
+ custom_offload_device_map,
94
+ )
95
+
96
+ recipe = """
97
+ quant_stage:
98
+ quant_modifiers:
99
+ QuantizationModifier:
100
+ ignore: ["lm_head"]
101
+ config_groups:
102
+ group_0:
103
+ weights:
104
+ num_bits: 8
105
+ type: float
106
+ strategy: channel
107
+ dynamic: false
108
+ symmetric: true
109
+ input_activations:
110
+ num_bits: 8
111
+ type: float
112
+ strategy: token
113
+ dynamic: true
114
+ symmetric: true
115
+ targets: ["Linear"]
116
+ """
117
+
118
+ model_stub = "meta-llama/Meta-Llama-3.1-405B-Instruct"
119
+ model_name = model_stub.split("/")[-1]
120
+
121
+ device_map = calculate_offload_device_map(
122
+ model_stub, reserve_for_hessians=False, num_gpus=8, torch_dtype="auto"
123
+ )
124
+
125
+ model = SparseAutoModelForCausalLM.from_pretrained(
126
+ model_stub, torch_dtype="auto", device_map=device_map
127
+ )
128
+
129
+ output_dir = f"./{model_name}-FP8-dynamic"
130
+
131
+ oneshot(
132
+ model=model,
133
+ recipe=recipe,
134
+ output_dir=output_dir,
135
+ save_compressed=True,
136
+ tokenizer=AutoTokenizer.from_pretrained(model_stub),
137
+ )
138
+ ```
139
+
140
+ ## Evaluation
141
+
142
+ This model was evaluated on the well-known Arena-Hard, OpenLLM v1, OpenLLM v2, HumanEval, and HumanEval+ benchmarks.
143
+ In all cases, model outputs were generated with the [vLLM](https://docs.vllm.ai/en/stable/) engine.
144
+
145
+ Arena-Hard evaluations were conducted using the [Arena-Hard-Auto](https://github.com/lmarena/arena-hard-auto) repository.
146
+ The model generated a single answer for each prompt form Arena-Hard, and each answer was judged twice by GPT-4.
147
+ We report below the scores obtained in each judgement and the average.
148
+
149
+ OpenLLM v1 and v2 evaluations were conducted using Neural Magic's fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct).
150
+ This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-405B-Instruct-evals) and a few fixes to OpenLLM v2 tasks.
151
+
152
+ HumanEval and HumanEval+ evaluations were conducted using Neural Magic's fork of the [EvalPlus](https://github.com/neuralmagic/evalplus) repository.
153
+
154
+ Detailed model outputs are available as HuggingFace datasets for [Arena-Hard](https://huggingface.co/datasets/neuralmagic/quantized-llama-3.1-arena-hard-evals), [OpenLLM v2](https://huggingface.co/datasets/neuralmagic/quantized-llama-3.1-leaderboard-v2-evals), and [HumanEval](https://huggingface.co/datasets/neuralmagic/quantized-llama-3.1-humaneval-evals).
155
+
156
+ ### Accuracy
157
+
158
+ <table>
159
+ <tr>
160
+ <td><strong>Benchmark</strong>
161
+ </td>
162
+ <td><strong>Meta-Llama-3.1-405B-Instruct </strong>
163
+ </td>
164
+ <td><strong>Meta-Llama-3.1-405B-Instruct-FP8-dynamic (this model)</strong>
165
+ </td>
166
+ <td><strong>Recovery</strong>
167
+ </td>
168
+ </tr>
169
+ <tr>
170
+ <td><strong>Arena Hard</strong>
171
+ </td>
172
+ <td>67.4 (67.3 / 67.5)
173
+ </td>
174
+ <td>66.7 (66.7 / 66.6)
175
+ </td>
176
+ <td>99.0%
177
+ </td>
178
+ </tr>
179
+ <tr>
180
+ <td><strong>OpenLLM v1</strong>
181
+ </td>
182
+ </tr>
183
+ <tr>
184
+ <td>MMLU (5-shot)
185
+ </td>
186
+ <td>87.4
187
+ </td>
188
+ <td>87.5
189
+ </td>
190
+ <td>100.0%
191
+ </td>
192
+ </tr>
193
+ <tr>
194
+ <td>MMLU-cot (0-shot)
195
+ </td>
196
+ <td>88.1
197
+ </td>
198
+ <td>88.1
199
+ </td>
200
+ <td>100.0%
201
+ </td>
202
+ </tr>
203
+ <tr>
204
+ <td>ARC Challenge (0-shot)
205
+ </td>
206
+ <td>95.0
207
+ </td>
208
+ <td>95.0
209
+ </td>
210
+ <td>100.0%
211
+ </td>
212
+ </tr>
213
+ <tr>
214
+ <td>GSM-8K-cot (8-shot, strict-match)
215
+ </td>
216
+ <td>96.0
217
+ </td>
218
+ <td>95.8
219
+ </td>
220
+ <td>99.8%
221
+ </td>
222
+ </tr>
223
+ <tr>
224
+ <td>Hellaswag (10-shot)
225
+ </td>
226
+ <td>88.5
227
+ </td>
228
+ <td>88.5
229
+ </td>
230
+ <td>99.9%
231
+ </td>
232
+ </tr>
233
+ <tr>
234
+ <td>Winogrande (5-shot)
235
+ </td>
236
+ <td>87.2
237
+ </td>
238
+ <td>88.0
239
+ </td>
240
+ <td>100.9%
241
+ </td>
242
+ </tr>
243
+ <tr>
244
+ <td>TruthfulQA (0-shot, mc2)
245
+ </td>
246
+ <td>65.3
247
+ </td>
248
+ <td>65.3
249
+ </td>
250
+ <td>99.9%
251
+ </td>
252
+ </tr>
253
+ <tr>
254
+ <td><strong>Average</strong>
255
+ </td>
256
+ <td><strong>86.8</strong>
257
+ </td>
258
+ <td><strong>86.9</strong>
259
+ </td>
260
+ <td><strong>100.0%</strong>
261
+ </td>
262
+ </tr>
263
+ <tr>
264
+ <td><strong>OpenLLM v2</strong>
265
+ </td>
266
+ </tr>
267
+ <tr>
268
+ <td>MMLU-Pro (5-shot)
269
+ </td>
270
+ <td>59.7
271
+ </td>
272
+ <td>59.4
273
+ </td>
274
+ <td>99.4%
275
+ </td>
276
+ </tr>
277
+ <tr>
278
+ <td>IFEval (0-shot)
279
+ </td>
280
+ <td>87.7
281
+ </td>
282
+ <td>86.8
283
+ </td>
284
+ <td>99.0%
285
+ </td>
286
+ </tr>
287
+ <tr>
288
+ <td>BBH (3-shot)
289
+ </td>
290
+ <td>67.0
291
+ </td>
292
+ <td>67.1
293
+ </td>
294
+ <td>100.1%
295
+ </td>
296
+ </tr>
297
+ <tr>
298
+ <td>Math-|v|-5 (4-shot)
299
+ </td>
300
+ <td>39.0
301
+ </td>
302
+ <td>38.8
303
+ </td>
304
+ <td>99.7%
305
+ </td>
306
+ </tr>
307
+ <tr>
308
+ <td>GPQA (0-shot)
309
+ </td>
310
+ <td>19.5
311
+ </td>
312
+ <td>19.0
313
+ </td>
314
+ <td>97.4%
315
+ </td>
316
+ </tr>
317
+ <tr>
318
+ <td>MuSR (0-shot)
319
+ </td>
320
+ <td>19.5
321
+ </td>
322
+ <td>20.8
323
+ </td>
324
+ <td>106.9%
325
+ </td>
326
+ </tr>
327
+ <tr>
328
+ <td><strong>Average</strong>
329
+ </td>
330
+ <td><strong>48.7</strong>
331
+ </td>
332
+ <td><strong>48.7</strong>
333
+ </td>
334
+ <td><strong>99.9%</strong>
335
+ </td>
336
+ </tr>
337
+ <tr>
338
+ <td><strong>Coding</strong>
339
+ </td>
340
+ </tr>
341
+ <tr>
342
+ <td>HumanEval pass@1
343
+ </td>
344
+ <td>86.8
345
+ </td>
346
+ <td>87.0
347
+ </td>
348
+ <td>100.2%
349
+ </td>
350
+ </tr>
351
+ <tr>
352
+ <td>HumanEval+ pass@1
353
+ </td>
354
+ <td>80.1
355
+ </td>
356
+ <td>81.0
357
+ </td>
358
+ <td>101.1%
359
+ </td>
360
+ </tr>
361
+ </table>
362
+
363
+
364
+ ### Reproduction
365
+
366
+ The results were obtained using the following commands:
367
+
368
+ #### MMLU
369
+ ```
370
+ lm_eval \
371
+ --model vllm \
372
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,max_gen_toks=10,tensor_parallel_size=8 \
373
+ --tasks mmlu_llama_3.1_instruct \
374
+ --apply_chat_template \
375
+ --fewshot_as_multiturn \
376
+ --num_fewshot 5 \
377
+ --batch_size auto
378
+ ```
379
+
380
+ #### MMLU-cot
381
+ ```
382
+ lm_eval \
383
+ --model vllm \
384
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,max_gen_toks=1024,tensor_parallel_size=8 \
385
+ --tasks mmlu_cot_0shot_llama_3.1_instruct \
386
+ --apply_chat_template \
387
+ --num_fewshot 0 \
388
+ --batch_size auto
389
+ ```
390
+
391
+ #### ARC-Challenge
392
+ ```
393
+ lm_eval \
394
+ --model vllm \
395
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
396
+ --tasks arc_challenge_llama_3.1_instruct \
397
+ --apply_chat_template \
398
+ --num_fewshot 0 \
399
+ --batch_size auto
400
+ ```
401
+
402
+ #### GSM-8K
403
+ ```
404
+ lm_eval \
405
+ --model vllm \
406
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
407
+ --tasks gsm8k_cot_llama_3.1_instruct \
408
+ --apply_chat_template \
409
+ --fewshot_as_multiturn \
410
+ --num_fewshot 8 \
411
+ --batch_size auto
412
+ ```
413
+
414
+ #### Hellaswag
415
+ ```
416
+ lm_eval \
417
+ --model vllm \
418
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
419
+ --tasks hellaswag \
420
+ --num_fewshot 10 \
421
+ --batch_size auto
422
+ ```
423
+
424
+ #### Winogrande
425
+ ```
426
+ lm_eval \
427
+ --model vllm \
428
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
429
+ --tasks winogrande \
430
+ --num_fewshot 5 \
431
+ --batch_size auto
432
+ ```
433
+
434
+ #### TruthfulQA
435
+ ```
436
+ lm_eval \
437
+ --model vllm \
438
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic",dtype=auto,add_bos_token=True,max_model_len=4096,tensor_parallel_size=8 \
439
+ --tasks truthfulqa \
440
+ --num_fewshot 0 \
441
+ --batch_size auto
442
+ ```
443
+
444
+ #### OpenLLM v2
445
+ ```
446
+ lm_eval \
447
+ --model vllm \
448
+ --model_args pretrained="neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic",dtype=auto,max_model_len=4096,tensor_parallel_size=8,enable_chunked_prefill=True \
449
+ --apply_chat_template \
450
+ --fewshot_as_multiturn \
451
+ --tasks leaderboard \
452
+ --batch_size auto
453
+ ```
454
+
455
+ #### HumanEval and HumanEval+
456
+ ##### Generation
457
+ ```
458
+ python3 codegen/generate.py \
459
+ --model neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic \
460
+ --bs 16 \
461
+ --temperature 0.2 \
462
+ --n_samples 50 \
463
+ --root "." \
464
+ --dataset humaneval \
465
+ --tp 8
466
+ ```
467
+ ##### Sanitization
468
+ ```
469
+ python3 evalplus/sanitize.py \
470
+ humaneval/neuralmagic--Meta-Llama-3.1-405B-Instruct-FP8-dynamic_vllm_temp_0.2
471
+ ```
472
+ ##### Evaluation
473
+ ```
474
+ evalplus.evaluate \
475
+ --dataset humaneval \
476
+ --samples humaneval/neuralmagic--Meta-Llama-3.1-405B-Instruct-FP8-dynamic_vllm_temp_0.2-sanitized
477
+ ```
config.json ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/mnt/nvme1/meta-llama-8-kv-heads/Meta-Llama-3.1-405B-Instruct",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 128000,
9
+ "eos_token_id": [
10
+ 128001,
11
+ 128008,
12
+ 128009
13
+ ],
14
+ "hidden_act": "silu",
15
+ "hidden_size": 16384,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 53248,
18
+ "max_position_embeddings": 131072,
19
+ "mlp_bias": false,
20
+ "model_type": "llama",
21
+ "num_attention_heads": 128,
22
+ "num_hidden_layers": 126,
23
+ "num_key_value_heads": 8,
24
+ "pretraining_tp": 1,
25
+ "rms_norm_eps": 1e-05,
26
+ "rope_scaling": {
27
+ "factor": 8.0,
28
+ "high_freq_factor": 4.0,
29
+ "low_freq_factor": 1.0,
30
+ "original_max_position_embeddings": 8192,
31
+ "rope_type": "llama3"
32
+ },
33
+ "rope_theta": 500000.0,
34
+ "tie_word_embeddings": false,
35
+ "torch_dtype": "bfloat16",
36
+ "transformers_version": "4.44.0",
37
+ "use_cache": true,
38
+ "vocab_size": 128256,
39
+ "quantization_config": {
40
+ "config_groups": {
41
+ "group_0": {
42
+ "input_activations": {
43
+ "block_structure": null,
44
+ "dynamic": true,
45
+ "group_size": null,
46
+ "num_bits": 8,
47
+ "observer": "memoryless",
48
+ "observer_kwargs": {},
49
+ "strategy": "token",
50
+ "symmetric": true,
51
+ "type": "float"
52
+ },
53
+ "output_activations": null,
54
+ "targets": [
55
+ "Linear"
56
+ ],
57
+ "weights": {
58
+ "block_structure": null,
59
+ "dynamic": false,
60
+ "group_size": null,
61
+ "num_bits": 8,
62
+ "observer": "minmax",
63
+ "observer_kwargs": {},
64
+ "strategy": "channel",
65
+ "symmetric": true,
66
+ "type": "float"
67
+ }
68
+ }
69
+ },
70
+ "format": "naive-quantized",
71
+ "global_compression_ratio": 1.240844678218891,
72
+ "ignore": [
73
+ "lm_head"
74
+ ],
75
+ "kv_cache_scheme": null,
76
+ "quant_method": "compressed-tensors",
77
+ "quantization_status": "frozen"
78
+ }
79
+ }
generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 128000,
3
+ "do_sample": true,
4
+ "eos_token_id": [
5
+ 128001,
6
+ 128008,
7
+ 128009
8
+ ],
9
+ "temperature": 0.6,
10
+ "top_p": 0.9,
11
+ "transformers_version": "4.44.0"
12
+ }
model-00001-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4348d16ae9df91a32354452173aa684c47d38fb3eb0c828c9a2aeeefd2404cf3
3
+ size 4773188696
model-00002-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97758aaea4d3186ac8d39cb4c3f46c0beddef57f63952d966d63538c0be6f5aa
3
+ size 4933097744
model-00003-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f4d65575b39ad8bdaa2dc49787ed6e13b1f1cb8f7175e9004dc88327364406f
3
+ size 4631063728
model-00004-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:056866c6d81ba0b340fe9d883ed2699e2eef62184583c4ba3936d172a61e6144
3
+ size 4933097744
model-00005-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb7d70cb0fb8993d957f7327e4c239a1aab5587d3fbf570f6d147cd5637c71b5
3
+ size 4631063728
model-00006-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2e4080c6b89cb2ca05e0dc65100c9a69bb9946bf75ddd4dbf80f848ff309872c
3
+ size 4933097744
model-00007-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb8478700043eb821c2a386870a097d5303935b337da5595e70396c8e66f9b4a
3
+ size 4631063728
model-00008-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9eb1a5b66923a1c466011a6c3e7f6ab23a447900b1e606549a8ac64e59e47cd5
3
+ size 4933097760
model-00009-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8e4f7f3d6bcae94a1f440769af88f3ab39b38902e4af19d452baf4067118a35
3
+ size 4631063752
model-00010-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:72f45fff8f48e6c3c44c15522bc790caf2ba3259999112a56e4a5bdb34098dda
3
+ size 4933097760
model-00011-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fde2e9041a23bfcf5b2fe61ea0ab761a719e332c6ceec1a9357523d17b4fd139
3
+ size 4631063752
model-00012-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4cf61262b8a7eb86f2639f7e0bc006b590e70ce2f6ffa474943f670837eef20b
3
+ size 4933097760
model-00013-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:16963a53bf5b5f34dfdc954164f0cd3974da49b969913221153b1f13721fe9e6
3
+ size 4631063752
model-00014-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1744110f5934aa55d23b39179ab7f39f1059fb2e100fe30a99fd269550b81da
3
+ size 4933097760
model-00015-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7fd9f2ee8b68719664397af25ed6cff522237fdd781f8bb7c30c50cf8e5545f9
3
+ size 4631063752
model-00016-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:38b73868a9d451f673e89e6f4b52bda1b32851c23afc6d6d5b5322caeec312eb
3
+ size 4933097760
model-00017-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f616cccffc610561019350e2af8b15975b0c8b408fc49f5e157d8435fa34c578
3
+ size 4631063752
model-00018-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e316a1e967e413b27b4ebaa8d3892eec00b9c899c566d584fa8a800eeac1d43b
3
+ size 4933097760
model-00019-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a982203d9cecdaf7f7e111881d788b0a1b4a78d260de33bd15d60e04716c7ba
3
+ size 4631063752
model-00020-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:12ae44a6f3a72ab6dac83c84cc407e5cdc501e3f1e3e3f60d34644e389f5ac83
3
+ size 4933097760
model-00021-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:001836cb75107ab700beb7b357034f2c4b488c0380e49c10ccaea882494c7d34
3
+ size 4631063752
model-00022-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d9d060cbf66e7d0d925f648e239210879f312e09e640ff8d5ce0f53045ca516
3
+ size 4933097760
model-00023-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a73d60b78b14c3ca4dbb91680a7b63bb93cbf591b74ac66f77b5ada606f79c55
3
+ size 4631063752
model-00024-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b128fe3feffdd365aebaa098c0175a79b466cf25ac43d52bdcb9e05635cbbce3
3
+ size 4933097760
model-00025-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e1b227aa8f3469fd634d33c6dd56e38b2d6531a4eeefd7d65d0405b8055a9fd
3
+ size 4631063752
model-00026-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39280d9b4a9d8f0579929883e2d52acf67ef44616060e06878fcac17100719c1
3
+ size 4933097760
model-00027-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3231ee99030495fadfd2b5a4bec05fd5dfb79df0a5774dae946856b8b1d54d2
3
+ size 4631063752
model-00028-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:127dd75728a5ca862a927385b342dac266aee81bc3d73d0a4a5232258ca24c33
3
+ size 4933097760
model-00029-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce76397d654f0cfc86bf41909b7c48cabe779dfbf6a583d74fc10a8c862b357e
3
+ size 4631063752
model-00030-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b61c3270ea7ddf4db3855d7acb1884c1e57cd12d6fee0445c908f86ce8e16170
3
+ size 4933097760
model-00031-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb864d68a7fe712df011844be8413c5c5de0dc09941a8df4ec9dc637cf27318d
3
+ size 4631063752
model-00032-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e2a614ee2e6289179f1f0bcdd60744f04a97a6ac053c3292ecc489dd1e02c6e
3
+ size 4933097760
model-00033-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9fdbdbd58e49be94498807a880f06598fe5328296352ecbd153a49cf06d458ac
3
+ size 4631063752
model-00034-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab4245d34461470c2e647cd7c6b72e89569e45c7171fe0c236d2293f86a0e648
3
+ size 4933097760
model-00035-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0622f8f0b2d5da4f46c0bc1c64e6318fc1cdebf5bd9a6cdc5d596430c0fa7ec5
3
+ size 4631063752
model-00036-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10ac2d9b50f53954d7af02c355a3a46fd3d6f7a3e0764fd7bba6823c060c69fc
3
+ size 4933097760
model-00037-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1575aa47b277f1bf987d36e972e681d3f5fdb94eb35f3b5e4d89ec5c34089cc2
3
+ size 4631063752
model-00038-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0275dda813566a99efc1d2ed4467b589eb0b06a99cc3d387af82d23e0dc4d5e6
3
+ size 4933097760
model-00039-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:38c0890183aae338aa320b0fb89ddc2d65b02d5f35b1b2c6f32d6402fb5d256f
3
+ size 4631063752
model-00040-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1fe5d1fa84331883bf0b4c0567e6fc46a306500ed0c7f6ebadb8c164ee469edd
3
+ size 4933097760
model-00041-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e9e6c992cb1184f394bf07b20eb57fb3c9601a7a4bb7898ba673046f4af30ac2
3
+ size 4631063752
model-00042-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f3090b4191ef9f188c8bf7bdf33bb7e95edb5dd6c661a661eb32094ed08fe309
3
+ size 4933097760
model-00043-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88067ececbb254f4e586ace3711afc7f7f9c3c664e4a11e74273650310fdc3ee
3
+ size 4631063752
model-00044-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bf93ec4fceded4f8984a21f333c6c58f447366ab5fca5ef4f1a8726130caa2f3
3
+ size 4933097760
model-00045-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c1653a664070e037becde300c34b8cea10cbaa402ef13b79b528e57317f74153
3
+ size 4631063752
model-00046-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c6533a49788b4779932dec28572d7629035d2cd47aa563d073ffe0972eb2a582
3
+ size 4933097760
model-00047-of-00086.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88983387d93204333940a7554290d5cc72ae99fb5ec000048fde4e2f32275741
3
+ size 4631063752