OPEA
/

DeepSeek-V3-int4-sym-gptq-inc

Safetensors

deepseek_v3

custom_code

4-bit precision

gptq

Model card Files Files and versions Community

cicdatopea commited on 22 days ago

Commit

eef00c1

verified ·

1 Parent(s): fc1640b

Update README.md

Browse files

Files changed (1) hide show

README.md +62 -6

README.md CHANGED Viewed

@@ -3,7 +3,9 @@ datasets:
 - NeelNanda/pile-10k
 base_model:
 - deepseek-ai/DeepSeek-V3
 ---
 ## Model Details
 This model is an int4 model with group_size 128 and symmetric quantization of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm.
@@ -16,18 +18,32 @@ Please follow the license of the original model.
 ## How To Use
-### INT4 Inference
-````python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
-quantized_model_dir = "OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview"
 model = AutoModelForCausalLM.from_pretrained(
     quantized_model_dir,
     torch_dtype=torch.float16,
     trust_remote_code=True,
-    device_map="auto"
 )
@@ -56,8 +72,6 @@ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
-## The following result is inferred on CPU with qbits backend
 prompt = "9.11和9.8哪个数字大"
 ##INT4
@@ -146,8 +160,50 @@ prompt = "There is a girl who likes adventure,"
 prompt = "Please give a brief introduction of DeepSeek company."
 ##INT4:
 """DeepSeek Artificial Intelligence Co., Ltd. (referred to as "DeepSeek" or "深度求索") , founded in 2023, is a Chinese company dedicated to making AGI a reality"""
 ````
 ### Evaluate the model

 - NeelNanda/pile-10k
 base_model:
 - deepseek-ai/DeepSeek-V3
 ---
 ## Model Details
 This model is an int4 model with group_size 128 and symmetric quantization of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm.
 ## How To Use
+### INT4 Inference on CPU with Qbits
+INT4 Inference on CPU with Qbits
+pip3 install auto-round **(it will install intel-extension-for-pytorch and intel-extension-for-transformers both)**. For intel cpu, it will use intel-extension-for-pytorch , for others, it will use intel-extension-for-transformers.
+**To make sure to use qbits with intel-extension-for-transformers, please uninstall intel-extension-for-pytorch, which we have not tested for this model yet.**
+~~~python
+from auto_round import AutoRoundConfig ##must import for autoround format
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
+quantized_model_dir = "OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview"
+quantization_config = AutoRoundConfig(
+    backend="cpu"
+)
 model = AutoModelForCausalLM.from_pretrained(
     quantized_model_dir,
     torch_dtype=torch.float16,
     trust_remote_code=True,
+    device_map="cpu",
+    revision="8fe0735",##use autoround format, the only difference is config.json
+    quantization_config = quantization_config, ##cpu only machine could not set this
 )
 response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 print(response)
 prompt = "9.11和9.8哪个数字大"
 ##INT4
 prompt = "Please give a brief introduction of DeepSeek company."
 ##INT4:
 """DeepSeek Artificial Intelligence Co., Ltd. (referred to as "DeepSeek" or "深度求索") , founded in 2023, is a Chinese company dedicated to making AGI a reality"""
+~~~
+### INT4 Inference on CUDA(have not tested, maybe need 8X80G GPU)
+````python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+quantized_model_dir = "OPEA/DeepSeek-V3-int4-sym-gptq-inc-preview"
+model = AutoModelForCausalLM.from_pretrained(
+    quantized_model_dir,
+    torch_dtype=torch.float16,
+    trust_remote_code=True,
+    device_map="auto"
+)
+tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir,trust_remote_code=True)
+prompt = "There is a girl who likes adventure,"
+messages = [
+    {"role": "system", "content": "You are a helpful assistant."},
+    {"role": "user", "content": prompt}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    model_inputs.input_ids,
+    max_new_tokens=200,  ##change this to align with the official usage
+    do_sample=False  ##change this to align with the official usage
+)
+generated_ids = [
+output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
 ````
 ### Evaluate the model