OPEA
/

Safetensors
sys-lpot-val commited on
Commit
915d5b2
1 Parent(s): 14dbc89

upload auto_round format

Browse files

Signed-off-by: sys-lpot-val <sys_lpot_val@intel.com>

Files changed (5) hide show
  1. .gitattributes +1 -0
  2. README.md +100 -58
  3. config.json +2 -2
  4. model.safetensors +2 -2
  5. quantization_config.json +3 -0
.gitattributes CHANGED
@@ -41,3 +41,4 @@ generation_config.json filter=lfs diff=lfs merge=lfs -text
41
  quantize_config.json filter=lfs diff=lfs merge=lfs -text
42
  special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
43
  tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
 
 
41
  quantize_config.json filter=lfs diff=lfs merge=lfs -text
42
  special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
43
  tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
44
+ quantization_config.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -3,22 +3,18 @@ license: apache-2.0
3
  datasets:
4
  - NeelNanda/pile-10k
5
  ---
6
-
7
  ## Model Details
8
 
9
- This model is an int4 model with group_size 128 with quantized lm-head of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round), auto-round is needed to run this model
10
 
11
  ## How To Use
12
 
13
- ### INT4 Inference
14
-
15
 
 
16
 
17
  ```python
18
- ##git clone https://github.com/intel/auto-round.git
19
- ##cd auto-round && pip install -vvv --no-build-isolation -e .
20
- from auto_round import AutoHfQuantizer ##must import
21
- import torch
22
  from transformers import AutoModelForCausalLM,AutoTokenizer
23
  quantized_model_dir = "OPEA/Qwen2.5-1.5B-Instruct-int4-inc"
24
  tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
@@ -27,6 +23,7 @@ model = AutoModelForCausalLM.from_pretrained(
27
  quantized_model_dir,
28
  torch_dtype='auto',
29
  device_map="auto",
 
30
  )
31
 
32
  ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
@@ -48,7 +45,7 @@ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
48
 
49
  generated_ids = model.generate(
50
  model_inputs.input_ids,
51
- max_new_tokens=50, ##change this to align with the official usage
52
  do_sample=False ##change this to align with the official usage
53
  )
54
  generated_ids = [
@@ -58,78 +55,126 @@ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_
58
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
59
  print(response)
60
 
61
- ##prompt = "There is a girl who likes adventure,"
62
- ##That's great! What kind of adventure does she like?
 
 
63
 
64
- ##prompt = "Which one is bigger, 9.11 or 9.8"
65
- ##To determine which number is larger between 9.11 and 9.8, you can compare them directly:
66
- ##1. Start with the numbers: 9.11 and 9.8.
67
- ##2. Compare their digits from left to
68
 
 
 
 
 
69
 
70
- ##prompt = "Once upon a time,"
71
- ##once upon a time, there was a young girl named Lily who lived in a small village nestled among the rolling hills of England. She had always been fascinated by nature and the beauty of the world around her.One day, while exploring the woods near\
72
 
73
- ##prompt = "请介绍一下阿里巴巴公司"
74
- ##阿里巴巴集团是一家全球领先的电子商务和科技企业,成立于1999年。阿里巴巴集团总部位于中国杭州,并在全球范围内拥有超过20个运营中心。
75
- ##阿里巴巴集团的业务范围包括:
76
- ##1. 电子商务:阿里巴巴集团是全球
77
 
78
- ```
 
79
 
80
- ### Evaluate the model
 
81
 
82
- pip3 install lm-eval==0.4.2
 
 
83
 
84
- ```bash
85
- git clone https://github.com/intel/auto-round
86
- cd auto-round
87
- python -m auto_round --model "OPEA/Qwen2.5-1.5B-Instruct-int4-inc" --eval --eval_bs 16 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k,cmmlu,ceval-valid
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
  ```
89
 
90
- | Metric | BF16 | INT4(group_size 128) | INT4(group_size 32) |
91
- |:-------------- | :----: | :----: |:------:|
92
- | Avg | 0.5646 | 0.5668 | 0.5699 |
93
- | mmlu | 0.6010 | 0.5876 | 0.5924 |
94
- | cmmlu | 0.6497 | 0.6146 | 0.6259 |
95
- | ceval-valid | 0.6597 | 0.6382 | 0.6404 |
96
- | lambada_openai | 0.6095 | 0.5886 | 0.6082 |
97
- | hellaswag | 0.5082 | 0.4985 | 0.5012 |
98
- | winogrande | 0.6298 | 0.6204 | 0.6409 |
99
- | piqa | 0.7633 | 0.7519 | 0.7650 |
100
- | truthfulqa_mc1 | 0.3109 | 0.3158 | 0.3060 |
101
- | openbookqa | 0.3160 | 0.2940 | 0.3020 |
102
- | boolq | 0.7789 | 0.7703 | 0.7681 |
103
- | arc_easy | 0.7677 | 0.7660 | 0.7681 |
104
- | arc_challenge | 0.4343 | 0.4454 | 0.4360 |
105
- | gsm8k 5 shots | 0.3101 | 0.4776 | 0.4519 |
106
 
 
 
 
107
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
 
110
 
111
- ### Reproduce the model
112
 
113
- Here is the sample command to reproduce the model. We observed a larger accuracy drop in Chinese tasks and recommend using a high-quality Chinese dataset for calibration. However, we did not achieve better accuracy with some public datasets.
114
 
115
  ```bash
116
- git clone https://github.com/intel/auto-round
117
- cd auto-round
118
- python -m auto_round \
119
- --model_name Qwen/Qwen2.5-1.5B-Instruct \
120
  --device 0 \
121
  --group_size 128 \
122
  --nsamples 512 \
123
  --bits 4 \
124
  --iter 1000 \
125
  --disable_eval \
126
- --model_dtype "float16" \
127
- --format 'auto_round' \
128
  --output_dir "./tmp_autoround"
129
  ```
130
 
131
-
132
-
133
  ## Ethical Considerations and Limitations
134
 
135
  The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
@@ -142,15 +187,12 @@ Users (both direct and downstream) should be made aware of the risks, biases and
142
 
143
  Here are a couple of useful links to learn more about Intel's AI software:
144
 
145
- * Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
146
- * Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
147
 
148
  ## Disclaimer
149
 
150
  The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
151
 
152
-
153
-
154
  ## Cite
155
 
156
  @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
 
3
  datasets:
4
  - NeelNanda/pile-10k
5
  ---
 
6
  ## Model Details
7
 
8
+ This model is an int4 model with group_size 128 and symmetric quantization of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round). Load the model with `revision="14dbc8"` to use AutoGPTQ format
9
 
10
  ## How To Use
11
 
12
+ ### INT4 Inference(CPU/HPU/CUDA)
 
13
 
14
+ CPU requires auto-round version>0.3.1
15
 
16
  ```python
17
+ from auto_round import AutoRoundConfig ##must import for auto-round format
 
 
 
18
  from transformers import AutoModelForCausalLM,AutoTokenizer
19
  quantized_model_dir = "OPEA/Qwen2.5-1.5B-Instruct-int4-inc"
20
  tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
 
23
  quantized_model_dir,
24
  torch_dtype='auto',
25
  device_map="auto",
26
+ ##revision="14dbc8" ## AutoGPTQ format
27
  )
28
 
29
  ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
 
45
 
46
  generated_ids = model.generate(
47
  model_inputs.input_ids,
48
+ max_new_tokens=200, ##change this to align with the official usage
49
  do_sample=False ##change this to align with the official usage
50
  )
51
  generated_ids = [
 
55
  response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
56
  print(response)
57
 
58
+ prompt = "There is a girl who likes adventure,"
59
+ ##INT4:
60
+ """That's great! What kind of adventures do you like to go on? Do you prefer outdoor activities or indoor ones? Maybe we could come up with some ideas together!
61
+ """
62
 
63
+ ##BF16:
64
+ """That's great! Adventure can be an exciting and fulfilling experience for many people. What kind of adventures do you like to go on? Do you enjoy hiking, camping, or exploring new places? Or maybe you prefer more extreme activities like skydiving or bungee jumping? Whatever your interests may be, there are plenty of opportunities out there for someone who loves adventure.
65
+ """
 
66
 
67
+ prompt = "9.11和9.8哪个数字大"
68
+ #INT4:
69
+ """
70
+ 9.11 和 9.8 都是小数,它们的大小比较如下:
71
 
72
+ - 9.11 大于 9.8
 
73
 
74
+ 具体来说:
75
+ - 9.11 的十位和个位都是 9,十分位是 1。
76
+ - 9.8 的十位和个位都是 9,十分位也是 8。
 
77
 
78
+ 由于 1 > 8,在相同的小数部分相同时,较大的数字在十位上。因此,9.11 比 9.8 更大。
79
+ """
80
 
81
+ ##BF16:
82
+ """9.11 和 9.8 都是小数,比较它们的大小需要从左到右逐位进行比较。
83
 
84
+ 首先看整数部分:
85
+ - 9.11 的整数部分是 9。
86
+ - 9.8 的整数部分也是 9。
87
 
88
+ 因为两者的整数部分相同,所以继续比较小数部分:
89
+
90
+ - 9.11 的小数部分是 0.11。
91
+ - 9.8 的小数部分是 0.8。
92
+
93
+ 现在我们来比较这两个小数点后的数字:
94
+ - 0.11 和 0.8
95
+
96
+ 显然,0.11 小于 0.8。因此,9.11 比 9.8 大。
97
+
98
+ 所以,答案是:9.11 > 9.8。
99
+ """
100
+
101
+
102
+ prompt = "Once upon a time,"
103
+ ##INT4:
104
+ """I'm sorry, but I don't understand what you're asking. Could you please provide more context or clarify your question?"""
105
+
106
+ ##BF16:
107
+ """I'm ready to listen! What's the story you'd like me to hear?"""
108
+
109
+
110
+ prompt = "请简短介绍一下阿里巴巴公司"
111
+ ##INT4:
112
+ """阿里巴巴集团是一家中国领先的电子商务和数字技术公司,成立于1999年。它最初是淘宝网的运营者,后来发展成为全球最大的电商平台之一,并且在云计算、金融科技等领域也取得了显著成就。
113
+
114
+ 阿里巴巴旗下的主要业务包括:
115
+
116
+ 1. **淘宝**:一个在线购物平台,提供各种商品和服务。
117
+ 2. **天猫**:另一个大型电商平台,专注于销售品牌商品。
118
+ 3. **阿里云**:提供云计算服务,帮助企业和个人实现数字化转型。
119
+ 4. **蚂蚁金服**:提供金融服务,如支付宝支付系统、余额宝等。
120
+ 5. **菜鸟网络**:负责物流配送,支持电商交易和快递服务。
121
+ 6. **Lazada** 和 **Global Markets**:分别是中国和东南亚地区的电商平台。
122
+
123
+ 阿里巴巴通过这些业务为消费者提供了便捷的购物体验,同时也为企业和个人提供了多种商业解决方案和技术支持。此外,阿里巴巴还积极参与社会公益事业,致力于推动数字经济的发展和社会的可持续性。。"""
124
+
125
+ ##BF16:
126
+ """阿里巴巴集团是一家中国领先的电子商务和数字技术公司,成立于1999年。它以B2C(企业对消费者)和B2B(企业对企业)为主要业务模式,提供包括淘宝、天猫、速卖通等在内的在线购物平台,并通过支付宝为用户提供支付服务。阿里巴巴还涉足云计算、人工智能等领域,致力于推动数字经济的发展。
127
+ """
128
  ```
129
 
130
+ ### Evaluate the model
131
+
132
+ pip3 install lm-eval==0.4.5
 
 
 
 
 
 
 
 
 
 
 
 
 
133
 
134
+ ```bash
135
+ auto-round --model "OPEA/Qwen2.5-1.5B-Instruct-int4-inc" --eval --eval_bs 16 --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid
136
+ ```
137
 
138
+ | Metric | BF16 | INT4 |
139
+ | :----------------------------------------- | :----: | :----: |
140
+ | Avg | 0.5203 | 0.5133 |
141
+ | leaderboard_mmlu_pro 5 shots | 0.2930 | 0.2771 |
142
+ | leaderboard_ifeval inst_level_strict_acc | 0.4173 | 0.3765 |
143
+ | leaderboard_ifeval prompt_level_strict_acc | 0.2847 | 0.2440 |
144
+ | mmlu | 0.6016 | 0.5903 |
145
+ | cmmlu | 0.6482 | 0.6092 |
146
+ | ceval-valid | 0.6568 | 0.6181 |
147
+ | gsm8k 5 shots | 0.3086 | 0.4306 |
148
+ | lambada_openai | 0.6033 | 0.5882 |
149
+ | hellaswag | 0.5086 | 0.4979 |
150
+ | winogrande | 0.6259 | 0.6361 |
151
+ | piqa | 0.7650 | 0.7557 |
152
+ | truthfulqa_mc1 | 0.3133 | 0.3195 |
153
+ | openbookqa | 0.3180 | 0.3120 |
154
+ | boolq | 0.7804 | 0.7526 |
155
+ | arc_easy | 0.7647 | 0.7622 |
156
+ | arc_challenge | 0.4352 | 0.4420 |
157
 
158
 
159
 
160
+ ### Generate the model
161
 
162
+ Here is the sample command to generate the model. We observed a larger accuracy drop in Chinese tasks and recommend using a high-quality Chinese dataset for calibration or smaller group_size like 32.
163
 
164
  ```bash
165
+ auto-round \
166
+ --model Qwen/Qwen2.5-1.5B-Instruct \
 
 
167
  --device 0 \
168
  --group_size 128 \
169
  --nsamples 512 \
170
  --bits 4 \
171
  --iter 1000 \
172
  --disable_eval \
173
+ --model_dtype "fp16" \
174
+ --format 'auto_gptq,auto_round' \
175
  --output_dir "./tmp_autoround"
176
  ```
177
 
 
 
178
  ## Ethical Considerations and Limitations
179
 
180
  The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
 
187
 
188
  Here are a couple of useful links to learn more about Intel's AI software:
189
 
190
+ - Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
 
191
 
192
  ## Disclaimer
193
 
194
  The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
195
 
 
 
196
  ## Cite
197
 
198
  @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }
config.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a818cfe3a54f21beb9769e3cfc4332d71f31a205411fc8260470669899ce9f66
3
- size 1368
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9cb8ae2c7a0fc018dd2209ce1c0d82a240b9036a4ad7e65a2173ad6c91277e0f
3
+ size 1382
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3aea79517e9780575f7fd4c1969b48d99f8762c225280b799ddd6188f6c6c924
3
- size 1151050488
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b0e1c607f09e1b208fd7f6eb15ea3e43031062dbb1560cbb7691b44a2be0dda
3
+ size 1149862960
quantization_config.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f92209e21368ef298866e57e5f3838e7590119ba042ef4c15bf642f7f60e4f40
3
+ size 575