upload auto_round format

Browse files

Signed-off-by: sys-lpot-val <sys_lpot_val@intel.com>

Files changed (5) hide show

.gitattributes +1 -0
README.md +100 -58
config.json +2 -2
model.safetensors +2 -2
quantization_config.json +3 -0

.gitattributes CHANGED Viewed

@@ -41,3 +41,4 @@ generation_config.json filter=lfs diff=lfs merge=lfs -text
 quantize_config.json filter=lfs diff=lfs merge=lfs -text
 special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
 tokenizer_config.json filter=lfs diff=lfs merge=lfs -text

 quantize_config.json filter=lfs diff=lfs merge=lfs -text
 special_tokens_map.json filter=lfs diff=lfs merge=lfs -text
 tokenizer_config.json filter=lfs diff=lfs merge=lfs -text
+quantization_config.json filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -3,22 +3,18 @@ license: apache-2.0
 datasets:
 - NeelNanda/pile-10k
 ---
 ## Model Details
-This model is an int4 model with group_size 128 with quantized lm-head of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round), auto-round is needed to run this model
 ## How To Use
-### INT4 Inference
 ```python
-##git clone https://github.com/intel/auto-round.git
-##cd auto-round && pip install -vvv --no-build-isolation -e .
-from auto_round import AutoHfQuantizer ##must import
-import torch
 from transformers import AutoModelForCausalLM,AutoTokenizer
 quantized_model_dir = "OPEA/Qwen2.5-1.5B-Instruct-int4-inc"
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
@@ -27,6 +23,7 @@ model = AutoModelForCausalLM.from_pretrained(
     quantized_model_dir,
     torch_dtype='auto',
     device_map="auto",
 )
 ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
@@ -48,7 +45,7 @@ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 generated_ids = model.generate(
     model_inputs.input_ids,
-    max_new_tokens=50,  ##change this to align with the official usage
     do_sample=False  ##change this to align with the official usage
 )
 generated_ids = [
@@ -58,78 +55,126 @@ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
-##prompt = "There is a girl who likes adventure,"
-##That's great! What kind of adventure does she like?
-##prompt = "Which one is bigger, 9.11 or 9.8"
-##To determine which number is larger between 9.11 and 9.8, you can compare them directly:
-##1. Start with the numbers: 9.11 and 9.8.
-##2. Compare their digits from left to
-##prompt = "Once upon a time,"
-##once upon a time, there was a young girl named Lily who lived in a small village nestled among the rolling hills of England. She had always been fascinated by nature and the beauty of the world around her.One day, while exploring the woods near\
-##prompt = "请介绍一下阿里巴巴公司"
-##阿里巴巴集团是一家全球领先的电子商务和科技企业，成立于1999年。阿里巴巴集团总部位于中国杭州，并在全球范围内拥有超过20个运营中心。
-##阿里巴巴集团的业务范围包括：
-##1. 电子商务：阿里巴巴集团是全球
-```
-### Evaluate the model
-pip3 install lm-eval==0.4.2
-```bash
-git clone https://github.com/intel/auto-round
-cd auto-round
-python -m auto_round --model "OPEA/Qwen2.5-1.5B-Instruct-int4-inc" --eval --eval_bs 16  --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,mmlu,gsm8k,cmmlu,ceval-valid
 ```
-| Metric         | BF16   | INT4(group_size 128)   | INT4(group_size 32)   |
-|:-------------- | :----: | :----: |:------:|
-| Avg            | 0.5646 | 0.5668 | 0.5699 |
-| mmlu           | 0.6010 | 0.5876 | 0.5924 |
-| cmmlu          | 0.6497 | 0.6146 | 0.6259 |
-| ceval-valid    | 0.6597 | 0.6382 | 0.6404 |
-| lambada_openai | 0.6095 | 0.5886 | 0.6082 |
-| hellaswag      | 0.5082 | 0.4985 | 0.5012 |
-| winogrande     | 0.6298 | 0.6204 | 0.6409 |
-| piqa           | 0.7633 | 0.7519 | 0.7650 |
-| truthfulqa_mc1 | 0.3109 | 0.3158 | 0.3060 |
-| openbookqa     | 0.3160 | 0.2940 | 0.3020 |
-| boolq          | 0.7789 | 0.7703 | 0.7681 |
-| arc_easy       | 0.7677 | 0.7660 | 0.7681 |
-| arc_challenge  | 0.4343 | 0.4454 | 0.4360 |
-| gsm8k 5 shots  | 0.3101 | 0.4776 | 0.4519 |
-### Reproduce the model
-Here is the sample command to reproduce the model. We observed a larger accuracy drop in Chinese tasks and recommend using a high-quality Chinese dataset for calibration. However, we did not achieve better accuracy with some public datasets.
 ```bash
-git clone https://github.com/intel/auto-round
-cd auto-round
-python -m auto_round \
---model_name  Qwen/Qwen2.5-1.5B-Instruct \
 --device 0 \
 --group_size 128 \
 --nsamples 512 \
 --bits 4 \
 --iter 1000 \
 --disable_eval \
---model_dtype "float16" \
---format 'auto_round' \
 --output_dir "./tmp_autoround"
 ```
 ## Ethical Considerations and Limitations
 The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
@@ -142,15 +187,12 @@ Users (both direct and downstream) should be made aware of the risks, biases and
 Here are a couple of useful links to learn more about Intel's AI software:
-* Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
-* Intel Extension for Transformers [link](https://github.com/intel/intel-extension-for-transformers)
 ## Disclaimer
 The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
 ## Cite
 @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

 datasets:
 - NeelNanda/pile-10k
 ---
 ## Model Details
+This model is an int4 model with group_size 128 and symmetric quantization of [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) generated by [intel/auto-round](https://github.com/intel/auto-round). Load the model with `revision="14dbc8"` to use AutoGPTQ format
 ## How To Use
+### INT4 Inference(CPU/HPU/CUDA)
+CPU requires auto-round version>0.3.1
 ```python
+from auto_round import AutoRoundConfig ##must import for auto-round format
 from transformers import AutoModelForCausalLM,AutoTokenizer
 quantized_model_dir = "OPEA/Qwen2.5-1.5B-Instruct-int4-inc"
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
     quantized_model_dir,
     torch_dtype='auto',
     device_map="auto",
+    ##revision="14dbc8" ## AutoGPTQ format
 )
 ##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
 generated_ids = model.generate(
     model_inputs.input_ids,
+    max_new_tokens=200,  ##change this to align with the official usage
     do_sample=False  ##change this to align with the official usage
 )
 generated_ids = [
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
+prompt = "There is a girl who likes adventure,"
+##INT4:
+"""That's great! What kind of adventures do you like to go on? Do you prefer outdoor activities or indoor ones? Maybe we could come up with some ideas together!
+"""
+##BF16:
+"""That's great! Adventure can be an exciting and fulfilling experience for many people. What kind of adventures do you like to go on? Do you enjoy hiking, camping, or exploring new places? Or maybe you prefer more extreme activities like skydiving or bungee jumping? Whatever your interests may be, there are plenty of opportunities out there for someone who loves adventure.
+"""
+prompt = "9.11和9.8哪个数字大"
+#INT4:
+"""
+9.11 和 9.8 都是小数，它们的大小比较如下：
+- 9.11 大于 9.8
+具体来说：
+- 9.11 的十位和个位都是 9，十分位是 1。
+- 9.8 的十位和个位都是 9，十分位也是 8。
+由于 1 > 8，在相同的小数部分相同时，较大的数字在十位上。因此，9.11 比 9.8 更大。
+"""
+##BF16:
+"""9.11 和 9.8 都是小数，比较它们的大小需要从左到右逐位进行比较。
+首先看整数部分：
+- 9.11 的整数部分是 9。
+- 9.8 的整数部分也是 9。
+因为两者的整数部分相同，所以继续比较小数部分：
+- 9.11 的小数部分是 0.11。
+- 9.8 的小数部分是 0.8。
+现在我们来比较这两个小数点后的数字：
+- 0.11 和 0.8
+显然，0.11 小于 0.8。因此，9.11 比 9.8 大。
+所以，答案是：9.11 > 9.8。
+"""
+prompt = "Once upon a time,"
+##INT4:
+"""I'm sorry, but I don't understand what you're asking. Could you please provide more context or clarify your question?"""
+##BF16:
+"""I'm ready to listen! What's the story you'd like me to hear?"""
+prompt = "请简短介绍一下阿里巴巴公司"
+##INT4:
+"""阿里巴巴集团是一家中国领先的电子商务和数字技术公司，成立于1999年。它最初是淘宝网的运营者，后来发展成为全球最大的电商平台之一，并且在云计算、金融科技等领域也取得了显著成就。
+阿里巴巴旗下的主要业务包括：
+1. **淘宝**：一个在线购物平台，提供各种商品和服务。
+2. **天猫**：另一个大型电商平台，专注于销售品牌商品。
+3. **阿里云**：提供云计算服务，帮助企业和个人实现数字化转型。
+4. **蚂蚁金服**：提供金融服务，如支付宝支付系统、余额宝等。
+5. **菜鸟网络**：负责物流配送，支持电商交易和快递服务。
+6. **Lazada** 和 **Global Markets**：分别是中国和东南亚地区的电商平台。
+阿里巴巴通过这些业务为消费者提供了便捷的购物体验，同时也为企业和个人提供了多种商业解决方案和技术支持。此外，阿里巴巴还积极参与社会公益事业，致力于推动数字经济的发展和社会的可持续性。。"""
+##BF16:
+"""阿里巴巴集团是一家中国领先的电子商务和数字技术公司，成立于1999年。它以B2C（企业对消费者）和B2B（企业对企业）为主要业务模式，提供包括淘宝、天猫、速卖通等在内的在线购物平台，并通过支付宝为用户提供支付服务。阿里巴巴还涉足云计算、人工智能等领域，致力于推动数字经济的发展。
+"""
 ```
+### Evaluate the model
+pip3 install lm-eval==0.4.5
+```bash
+auto-round --model "OPEA/Qwen2.5-1.5B-Instruct-int4-inc" --eval --eval_bs 16  --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid
+```
+| Metric                                     |  BF16  |  INT4  |
+| :----------------------------------------- | :----: | :----: |
+| Avg                                        | 0.5203 | 0.5133 |
+| leaderboard_mmlu_pro 5 shots               | 0.2930 | 0.2771 |
+| leaderboard_ifeval inst_level_strict_acc   | 0.4173 | 0.3765 |
+| leaderboard_ifeval prompt_level_strict_acc | 0.2847 | 0.2440 |
+| mmlu                                       | 0.6016 | 0.5903 |
+| cmmlu                                      | 0.6482 | 0.6092 |
+| ceval-valid                                | 0.6568 | 0.6181 |
+| gsm8k 5 shots                              | 0.3086 | 0.4306 |
+| lambada_openai                             | 0.6033 | 0.5882 |
+| hellaswag                                  | 0.5086 | 0.4979 |
+| winogrande                                 | 0.6259 | 0.6361 |
+| piqa                                       | 0.7650 | 0.7557 |
+| truthfulqa_mc1                             | 0.3133 | 0.3195 |
+| openbookqa                                 | 0.3180 | 0.3120 |
+| boolq                                      | 0.7804 | 0.7526 |
+| arc_easy                                   | 0.7647 | 0.7622 |
+| arc_challenge                              | 0.4352 | 0.4420 |
+### Generate the model
+Here is the sample command to generate the model. We observed a larger accuracy drop in Chinese tasks and recommend using a high-quality Chinese dataset for calibration or smaller group_size like 32.
 ```bash
+auto-round \
+--model  Qwen/Qwen2.5-1.5B-Instruct \
 --device 0 \
 --group_size 128 \
 --nsamples 512 \
 --bits 4 \
 --iter 1000 \
 --disable_eval \
+--model_dtype "fp16" \
+--format 'auto_gptq,auto_round' \
 --output_dir "./tmp_autoround"
 ```
 ## Ethical Considerations and Limitations
 The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.
 Here are a couple of useful links to learn more about Intel's AI software:
+- Intel Neural Compressor [link](https://github.com/intel/neural-compressor)
 ## Disclaimer
 The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.
 ## Cite
 @article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

config.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a818cfe3a54f21beb9769e3cfc4332d71f31a205411fc8260470669899ce9f66
-size 1368

 version https://git-lfs.github.com/spec/v1
+oid sha256:9cb8ae2c7a0fc018dd2209ce1c0d82a240b9036a4ad7e65a2173ad6c91277e0f
+size 1382

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3aea79517e9780575f7fd4c1969b48d99f8762c225280b799ddd6188f6c6c924
-size 1151050488

 version https://git-lfs.github.com/spec/v1
+oid sha256:0b0e1c607f09e1b208fd7f6eb15ea3e43031062dbb1560cbb7691b44a2be0dda
+size 1149862960

quantization_config.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f92209e21368ef298866e57e5f3838e7590119ba042ef4c15bf642f7f60e4f40
+size 575