OPEA/Qwen2.5-72B-Instruct-int2-sym-inc

Model Details

This model is an int2 model with group_size 64 and symmetric quantization of Qwen/Qwen2.5-72B-Instruct generated by intel/auto-round. Load the model with revision e25bed0 to use AutoGPTQ fromat.

How To Use

INT4 Inference(CPU/HPU/CUDA)

from auto_round import AutoRoundConfig ##must import for auto-round format
from transformers import AutoModelForCausalLM,AutoTokenizer
quantized_model_dir = "OPEA/Qwen2.5-72B-Instruct-int2-sym-inc"
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype='auto',
    device_map="auto",
    ## revision = "e25bed0" ##AutoGPTQ format
)

##import habana_frameworks.torch.core as htcore ## uncommnet it for HPU
##import habana_frameworks.torch.hpu as hthpu ## uncommnet it for HPU
##model = model.to(torch.bfloat16).to("hpu") ## uncommnet it for HPU

prompt = "There is a girl who likes adventure,"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=200,  ##change this to align with the official usage
    do_sample=False  ##change this to align with the official usage
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
##INT4:
## That's great! Girls who enjoy adventure are often bold, curious, and open to new experiences. What kind of adventures does she like? Does she enjoy outdoor activities like hiking, camping, or rock climbing? Or does she prefer more travel-oriented adventures, such as exploring new cities or countries? Whatever her interests, it's wonderful that she has a sense of adventure and is willing to step out of her comfort zone to try new things. If you have any specific questions or ideas for adventures, I'd be happy to help!

##BF16:
## That sounds like a great quality to have! A love for adventure can lead to exciting experiences and personal growth. What kind of adventures does she enjoy? Does she like exploring new places, trying new activities, or seeking out thrilling experiences? Knowing more about her interests can help suggest specific ideas or activities that might appeal to her adventurous spirit.

prompt = "9.11和9.8哪个数字大"
##INT4:
## 9.11比9.8大。因为9.11 = 9 + 0.11，而9.8 = 9 + 0.8，0.11比0.8小，所以9.11 > 9.8。但是，实际上9.11和9.8的大小比较是基于它们的小数部分，9.11的小数部分0.11大于9.8的小数部分0.8，因此9.11大于9.8。

##BF16:
## 比较两个数字 9.11 和 9.8，可以按照以下步骤进行：
## 1. **整数部分**：两个数字的整数部分都是 9，所以需要比较小数部分。
## 2. **小数部分**：
##    - 9.11 的小数部分是 0.11
##    - 9.8 的小数部分是 0.8
## 3. **比较小数部分**：
##    - 0.11 和 0.8 比较时，0.8 明显大于 0.11。
## 因此，9.8 大于 9.11。

prompt = "Once upon a time,"
##INT4:
## Once upon a time, in a land far, far away, there was a magical kingdom. This kingdom was known for its vast forests, rolling hills, and a grand castle that stood at the center. The people of this kingdom lived in peace and prosperity, but one day, a dark cloud began to loom over their land.
## A powerful sorcerer, who had once been a friend to the king, had turned against the kingdom. He sought to take control and rule with an iron fist. The king, along with his loyal advisors and brave knights, set out to stop the sorcerer and protect their people.
## The story of their journey, filled with adventure, courage, and friendship, is one that has been told and retold for generations. It is a tale of heroes and villains, of love and sacrifice, and of the unyielding spirit of those who stand up for what is right.
## What would you like to know more about this story? Would you like me to continue with

##BF16:
## Once upon a time, in a land far, far away, there was a kingdom known for its beauty and prosperity. The kingdom was ruled by a wise and just king who loved his people dearly. In the heart of the kingdom stood a magnificent castle, surrounded by lush gardens and sparkling fountains.
## The king had a young daughter named Princess Elara, who was as kind and gentle as she was beautiful. She spent her days helping the poor and spreading joy throughout the kingdom. The people adored her, and she was beloved by all.
## One day, a great challenge arose. A dark forest on the outskirts of the kingdom began to grow wild and dangerous, threatening the safety of the villagers. The king called for a hero to tame the forest and protect his people. Many brave knights and warriors came forward, but none could succeed.
## Princess Elara, determined to help, decided to venture into the forest herself. Her father was hesitant, but he saw the determination in her eyes and knew

prompt = "请简短介绍一下阿里巴巴公司"
##INT4:
## 阿里巴巴（Alibaba）是一家中国的 multinational technology company headquartered in Hangzhou, Zhejiang, China. It was founded in 1999 by former English teacher Jack Ma and his partners. Alibaba is best known for its e-commerce platforms such as Taobao, Tmall, and AliExpress, but it also has extensive business interests in areas like cloud computing, artificial intelligence, and financial services. The company is one of the largest and most valuable in the world, and it has played a significant role in the growth of China's internet economy.

##BF16:
## 阿里巴巴集团是一家总部位于中国杭州的全球领先的电子商务和科技公司。成立于1999年，阿里巴巴最初是一个B2B在线市场，旨在连接中国制造商与全球买家。经过二十多年的发展，阿里巴巴已经发展成为涵盖电子商务、金融、物流、云计算等多个领域的综合性企业集团。
## 阿里巴巴旗下拥有淘宝网、天猫、菜鸟网络、阿里云等知名品牌，为消费者提供购物、支付、娱乐等多元化服务，同时也为企业提供营销、销售、物流和技术支持等全方位解决方案。此外，阿里巴巴还积极投资和孵化创新项目，推动数字经济的发展。
## 阿里巴巴始终秉持“让天下没有难做的生意”的使命，致力于通过技术创新促进全球经济的可持续发展。

Evaluate the model

pip3 install lm-eval==0.4.5

auto-round --model "OPEA/Qwen2.5-72B-Instruct-int2-sym-inc" --eval --eval_bs 16  --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid

Metric	BF16	INT4
Avg	0.7171	0.6843
mmlu	0.8334	0.7634
lambada_openai	0.7518	0.7215
hellaswag	0.7031	0.6464
winogrande	0.7601	0.7553
piqa	0.8313	0.8003
truthfulqa_mc1	0.5239	0.4896
openbookqa	0.3860	0.3820
boolq	0.9049	0.8881
arc_easy	0.8632	0.8354
arc_challenge	0.6135	0.5614

Generate the model

Here is the sample command to generate the model.

auto-round \
--model  Qwen/Qwen2.5-72B-Instruct \
--device 0 \
--group_size 64 \
--nsamples 1024 \
--bits 2 \
--iter 2000 \
--disable_eval \
--format 'auto_gptq,auto_round' \
--output_dir "./tmp_autoround"

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

OPEA
/

Qwen2.5-72B-Instruct-int2-sym-inc

Model Details

How To Use

INT4 Inference(CPU/HPU/CUDA)

Evaluate the model

Generate the model

Ethical Considerations and Limitations

Caveats and Recommendations

Disclaimer

Cite

Dataset used to train OPEA/Qwen2.5-72B-Instruct-int2-sym-inc

Collections including OPEA/Qwen2.5-72B-Instruct-int2-sym-inc

QWEN-AutoRound

VLMs-AutoRound