sys-lpot-val
commited on
Commit
•
915d5b2
1
Parent(s):
14dbc89
upload auto_round format
Browse filesSigned-off-by: sys-lpot-val <sys_lpot_val@intel.com>
- .gitattributes +1 -0
- README.md +100 -58
- config.json +2 -2
- model.safetensors +2 -2
- quantization_config.json +3 -0
.gitattributes
CHANGED
@@ -41,3 +41,4 @@ generation_config.json filter=lfs diff=lfs merge=lfs -text
|
|
41 |
quantize_config.json filter=lfs diff=lfs merge=lfs -text
|
42 |
special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
|
43 |
tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
|
|
|
|
41 |
quantize_config.json filter=lfs diff=lfs merge=lfs -text
|
42 |
special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
|
43 |
tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
|
44 |
+
quantization_config.json filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -3,22 +3,18 @@ license: apache-2.0
|
|
3 |
datasets:
|
4 |
- NeelNanda/pile-10k
|
5 |
---
|
6 |
-
|
7 |
## Model Details
|
8 |
|
9 |
-
This model is an int4 model with group_size 128
|
10 |
|
11 |
## How To Use
|
12 |
|
13 |
-
### INT4 Inference
|
14 |
-
|
15 |
|
|
|
16 |
|
17 |
```python
|
18 |
-
##
|
19 |
-
##cd auto-round && pip install -vvv --no-build-isolation -e .
|
20 |
-
from auto_round import AutoHfQuantizer ##must import
|
21 |
-
import torch
|
22 |
from transformers import AutoModelForCausalLM,AutoTokenizer
|
23 |
quantized_model_dir = "OPEA/Qwen2.5-1.5B-Instruct-int4-inc"
|
24 |
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
|
@@ -27,6 +23,7 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
27 |
quantized_model_dir,
|
28 |
torch_dtype='auto',
|
29 |
device_map="auto",
|
|
|
30 |
)
|
31 |
|
32 |
##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
|
@@ -48,7 +45,7 @@ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
|
48 |
|
49 |
generated_ids = model.generate(
|
50 |
model_inputs.input_ids,
|
51 |
-
max_new_tokens=
|
52 |
do_sample=False ##change this to align with the official usage
|
53 |
)
|
54 |
generated_ids = [
|
@@ -58,78 +55,126 @@ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_
|
|
58 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
59 |
print(response)
|
60 |
|
61 |
-
|
62 |
-
##
|
|
|
|
|
63 |
|
64 |
-
##
|
65 |
-
|
66 |
-
|
67 |
-
##2. Compare their digits from left to
|
68 |
|
|
|
|
|
|
|
|
|
69 |
|
70 |
-
|
71 |
-
##once upon a time, there was a young girl named Lily who lived in a small village nestled among the rolling hills of England. She had always been fascinated by nature and the beauty of the world around her.One day, while exploring the woods near\
|
72 |
|
73 |
-
|
74 |
-
|
75 |
-
|
76 |
-
##1. 电子商务:阿里巴巴集团是全球
|
77 |
|
78 |
-
|
|
|
79 |
|
80 |
-
|
|
|
81 |
|
82 |
-
|
|
|
|
|
83 |
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
```
|
89 |
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
| mmlu | 0.6010 | 0.5876 | 0.5924 |
|
94 |
-
| cmmlu | 0.6497 | 0.6146 | 0.6259 |
|
95 |
-
| ceval-valid | 0.6597 | 0.6382 | 0.6404 |
|
96 |
-
| lambada_openai | 0.6095 | 0.5886 | 0.6082 |
|
97 |
-
| hellaswag | 0.5082 | 0.4985 | 0.5012 |
|
98 |
-
| winogrande | 0.6298 | 0.6204 | 0.6409 |
|
99 |
-
| piqa | 0.7633 | 0.7519 | 0.7650 |
|
100 |
-
| truthfulqa_mc1 | 0.3109 | 0.3158 | 0.3060 |
|
101 |
-
| openbookqa | 0.3160 | 0.2940 | 0.3020 |
|
102 |
-
| boolq | 0.7789 | 0.7703 | 0.7681 |
|
103 |
-
| arc_easy | 0.7677 | 0.7660 | 0.7681 |
|
104 |
-
| arc_challenge | 0.4343 | 0.4454 | 0.4360 |
|
105 |
-
| gsm8k 5 shots | 0.3101 | 0.4776 | 0.4519 |
|
106 |
|
|
|
|
|
|
|
107 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
108 |
|
109 |
|
110 |
|
111 |
-
###
|
112 |
|
113 |
-
Here is the sample command to
|
114 |
|
115 |
```bash
|
116 |
-
|
117 |
-
|
118 |
-
python -m auto_round \
|
119 |
-
--model_name Qwen/Qwen2.5-1.5B-Instruct \
|
120 |
--device 0 \
|
121 |
--group_size 128 \
|
122 |
--nsamples 512 \
|
123 |
--bits 4 \
|
124 |
--iter 1000 \
|
125 |
--disable_eval \
|
126 |
-
--model_dtype "
|
127 |
-
--format 'auto_round' \
|
128 |
--output_dir "./tmp_autoround"
|
129 |
```
|
130 |
|
131 |
-
|
132 |
-
|
133 |
## Ethical Considerations and Limitations
|
134 |
|
135 |
The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
|
@@ -142,15 +187,12 @@ Users (both direct and downstream) should be made aware of the risks, biases and
|
|
142 |
|
143 |
Here are a couple of useful links to learn more about Intel's AI software:
|
144 |
|
145 |
-
|
146 |
-
* Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
|
147 |
|
148 |
## Disclaimer
|
149 |
|
150 |
The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
|
151 |
|
152 |
-
|
153 |
-
|
154 |
## Cite
|
155 |
|
156 |
@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
|
|
|
3 |
datasets:
|
4 |
- NeelNanda/pile-10k
|
5 |
---
|
|
|
6 |
## Model Details
|
7 |
|
8 |
+
This model is an int4 model with group_size 128 and symmetric quantization of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round). Load the model with `revision="14dbc8"` to use AutoGPTQ format
|
9 |
|
10 |
## How To Use
|
11 |
|
12 |
+
### INT4 Inference(CPU/HPU/CUDA)
|
|
|
13 |
|
14 |
+
CPU requires auto-round version>0.3.1
|
15 |
|
16 |
```python
|
17 |
+
from auto_round import AutoRoundConfig ##must import for auto-round format
|
|
|
|
|
|
|
18 |
from transformers import AutoModelForCausalLM,AutoTokenizer
|
19 |
quantized_model_dir = "OPEA/Qwen2.5-1.5B-Instruct-int4-inc"
|
20 |
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
|
|
|
23 |
quantized_model_dir,
|
24 |
torch_dtype='auto',
|
25 |
device_map="auto",
|
26 |
+
##revision="14dbc8" ## AutoGPTQ format
|
27 |
)
|
28 |
|
29 |
##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
|
|
|
45 |
|
46 |
generated_ids = model.generate(
|
47 |
model_inputs.input_ids,
|
48 |
+
max_new_tokens=200, ##change this to align with the official usage
|
49 |
do_sample=False ##change this to align with the official usage
|
50 |
)
|
51 |
generated_ids = [
|
|
|
55 |
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
56 |
print(response)
|
57 |
|
58 |
+
prompt = "There is a girl who likes adventure,"
|
59 |
+
##INT4:
|
60 |
+
"""That's great! What kind of adventures do you like to go on? Do you prefer outdoor activities or indoor ones? Maybe we could come up with some ideas together!
|
61 |
+
"""
|
62 |
|
63 |
+
##BF16:
|
64 |
+
"""That's great! Adventure can be an exciting and fulfilling experience for many people. What kind of adventures do you like to go on? Do you enjoy hiking, camping, or exploring new places? Or maybe you prefer more extreme activities like skydiving or bungee jumping? Whatever your interests may be, there are plenty of opportunities out there for someone who loves adventure.
|
65 |
+
"""
|
|
|
66 |
|
67 |
+
prompt = "9.11和9.8哪个数字大"
|
68 |
+
#INT4:
|
69 |
+
"""
|
70 |
+
9.11 和 9.8 都是小数,它们的大小比较如下:
|
71 |
|
72 |
+
- 9.11 大于 9.8
|
|
|
73 |
|
74 |
+
具体来说:
|
75 |
+
- 9.11 的十位和个位都是 9,十分位是 1。
|
76 |
+
- 9.8 的十位和个位都是 9,十分位也是 8。
|
|
|
77 |
|
78 |
+
由于 1 > 8,在相同的小数部分相同时,较大的数字在十位上。因此,9.11 比 9.8 更大。
|
79 |
+
"""
|
80 |
|
81 |
+
##BF16:
|
82 |
+
"""9.11 和 9.8 都是小数,比较它们的大小需要从左到右逐位进行比较。
|
83 |
|
84 |
+
首先看整数部分:
|
85 |
+
- 9.11 的整数部分是 9。
|
86 |
+
- 9.8 的整数部分也是 9。
|
87 |
|
88 |
+
因为两者的整数部分相同,所以继续比较小数部分:
|
89 |
+
|
90 |
+
- 9.11 的小数部分是 0.11。
|
91 |
+
- 9.8 的小数部分是 0.8。
|
92 |
+
|
93 |
+
现在我们来比较这两个小数点后的数字:
|
94 |
+
- 0.11 和 0.8
|
95 |
+
|
96 |
+
显然,0.11 小于 0.8。因此,9.11 比 9.8 大。
|
97 |
+
|
98 |
+
所以,答案是:9.11 > 9.8。
|
99 |
+
"""
|
100 |
+
|
101 |
+
|
102 |
+
prompt = "Once upon a time,"
|
103 |
+
##INT4:
|
104 |
+
"""I'm sorry, but I don't understand what you're asking. Could you please provide more context or clarify your question?"""
|
105 |
+
|
106 |
+
##BF16:
|
107 |
+
"""I'm ready to listen! What's the story you'd like me to hear?"""
|
108 |
+
|
109 |
+
|
110 |
+
prompt = "请简短介绍一下阿里巴巴公司"
|
111 |
+
##INT4:
|
112 |
+
"""阿里巴巴集团是一家中国领先的电子商务和数字技术公司,成立于1999年。它最初是淘宝网的运营者,后来发展成为全球最大的电商平台之一,并且在云计算、金融科技等领域也取得了显著成就。
|
113 |
+
|
114 |
+
阿里巴巴旗下的主要业务包括:
|
115 |
+
|
116 |
+
1. **淘宝**:一个在线购物平台,提供各种商品和服务。
|
117 |
+
2. **天猫**:另一个大型电商平台,专注于销售品牌商品。
|
118 |
+
3. **阿里云**:提供云计算服务,帮助企业和个人实现数字化转型。
|
119 |
+
4. **蚂蚁金服**:提供金融服务,如支付宝支付系统、余额宝等。
|
120 |
+
5. **菜鸟网络**:负责物流配送,支持电商交易和快递服务。
|
121 |
+
6. **Lazada** 和 **Global Markets**:分别是中国和东南亚地区的电商平台。
|
122 |
+
|
123 |
+
阿里巴巴通过这些业务为消费者提供了便捷的购物体验,同时也为企业和个人提供了多种商业解决方案和技术支持。此外,阿里巴巴还积极参与社会公益事业,致力于推动数字经济的发展和社会的可持续性。。"""
|
124 |
+
|
125 |
+
##BF16:
|
126 |
+
"""阿里巴巴集团是一家中国领先的电子商务和数字技术公司,成立于1999年。它以B2C(企业对消费者)和B2B(企业对企业)为主要业务模式,提供包括淘宝、天猫、速卖通等在内的在线购物平台,并通过支付宝为用户提供支付服务。阿里巴巴还涉足云计算、人工智能等领域,致力于推动数字经济的发展。
|
127 |
+
"""
|
128 |
```
|
129 |
|
130 |
+
### Evaluate the model
|
131 |
+
|
132 |
+
pip3 install lm-eval==0.4.5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
133 |
|
134 |
+
```bash
|
135 |
+
auto-round --model "OPEA/Qwen2.5-1.5B-Instruct-int4-inc" --eval --eval_bs 16 --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid
|
136 |
+
```
|
137 |
|
138 |
+
| Metric | BF16 | INT4 |
|
139 |
+
| :----------------------------------------- | :----: | :----: |
|
140 |
+
| Avg | 0.5203 | 0.5133 |
|
141 |
+
| leaderboard_mmlu_pro 5 shots | 0.2930 | 0.2771 |
|
142 |
+
| leaderboard_ifeval inst_level_strict_acc | 0.4173 | 0.3765 |
|
143 |
+
| leaderboard_ifeval prompt_level_strict_acc | 0.2847 | 0.2440 |
|
144 |
+
| mmlu | 0.6016 | 0.5903 |
|
145 |
+
| cmmlu | 0.6482 | 0.6092 |
|
146 |
+
| ceval-valid | 0.6568 | 0.6181 |
|
147 |
+
| gsm8k 5 shots | 0.3086 | 0.4306 |
|
148 |
+
| lambada_openai | 0.6033 | 0.5882 |
|
149 |
+
| hellaswag | 0.5086 | 0.4979 |
|
150 |
+
| winogrande | 0.6259 | 0.6361 |
|
151 |
+
| piqa | 0.7650 | 0.7557 |
|
152 |
+
| truthfulqa_mc1 | 0.3133 | 0.3195 |
|
153 |
+
| openbookqa | 0.3180 | 0.3120 |
|
154 |
+
| boolq | 0.7804 | 0.7526 |
|
155 |
+
| arc_easy | 0.7647 | 0.7622 |
|
156 |
+
| arc_challenge | 0.4352 | 0.4420 |
|
157 |
|
158 |
|
159 |
|
160 |
+
### Generate the model
|
161 |
|
162 |
+
Here is the sample command to generate the model. We observed a larger accuracy drop in Chinese tasks and recommend using a high-quality Chinese dataset for calibration or smaller group_size like 32.
|
163 |
|
164 |
```bash
|
165 |
+
auto-round \
|
166 |
+
--model Qwen/Qwen2.5-1.5B-Instruct \
|
|
|
|
|
167 |
--device 0 \
|
168 |
--group_size 128 \
|
169 |
--nsamples 512 \
|
170 |
--bits 4 \
|
171 |
--iter 1000 \
|
172 |
--disable_eval \
|
173 |
+
--model_dtype "fp16" \
|
174 |
+
--format 'auto_gptq,auto_round' \
|
175 |
--output_dir "./tmp_autoround"
|
176 |
```
|
177 |
|
|
|
|
|
178 |
## Ethical Considerations and Limitations
|
179 |
|
180 |
The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
|
|
|
187 |
|
188 |
Here are a couple of useful links to learn more about Intel's AI software:
|
189 |
|
190 |
+
- Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
|
|
|
191 |
|
192 |
## Disclaimer
|
193 |
|
194 |
The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
|
195 |
|
|
|
|
|
196 |
## Cite
|
197 |
|
198 |
@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
|
config.json
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9cb8ae2c7a0fc018dd2209ce1c0d82a240b9036a4ad7e65a2173ad6c91277e0f
|
3 |
+
size 1382
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0b0e1c607f09e1b208fd7f6eb15ea3e43031062dbb1560cbb7691b44a2be0dda
|
3 |
+
size 1149862960
|
quantization_config.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f92209e21368ef298866e57e5f3838e7590119ba042ef4c15bf642f7f60e4f40
|
3 |
+
size 575
|