OPEA
/

DeepSeek-V3-int4-sym-gptq-inc

4-bit precision

Model card Files Files and versions Community

cicdatopea commited on 2 days ago

Commit

7fa30e4

·

verified ·

1 Parent(s): 71be988

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -12,7 +12,7 @@ base_model:
 This model is an int4 model with group_size 128 and symmetric quantization of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm.
-**Loading the model in Transformers can be quite slow, especially with CUDA devices(30m-1hours). Consider using an alternative serving framework.** However, we have not tested it on other frameworks due to limited cuda resources.
 Please follow the license of the original model.
@@ -20,7 +20,9 @@ Please follow the license of the original model.
 **INT4 Inference on CUDA**(**at least 7*80G**)
-On CUDA devices, the computation dtype is typically FP16 for int4 , which may lead to overflow for this model. While we have added a workaround to address this issue, we cannot guarantee reliable performance for all prompts. **For better stability, using CPU version is recommended. Please refer to the following section for details.**
 ~~~python
 from transformers import AutoModelForCausalLM, AutoTokenizer

 This model is an int4 model with group_size 128 and symmetric quantization of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm.
+**Loading the model in Transformers can be quite slow, especially with CUDA devices(30m-1hours). Consider using an alternative serving framework (some frameworks have overflow issues).** However, we have not tested it on other frameworks due to limited cuda resources.
 Please follow the license of the original model.
 **INT4 Inference on CUDA**(**at least 7*80G**)
+On CUDA devices, the computation dtype is typically FP16 for int4 , which may lead to overflow for this model.
+While we have added a workaround to address this issue, we cannot guarantee reliable performance for all prompts.
+**For better stability, using CPU version is recommended. Please refer to the following section for details.**
 ~~~python
 from transformers import AutoModelForCausalLM, AutoTokenizer