cicdatopea commited on
Commit
7fa30e4
·
verified ·
1 Parent(s): 71be988

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -12,7 +12,7 @@ base_model:
12
 
13
  This model is an int4 model with group_size 128 and symmetric quantization of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm.
14
 
15
- **Loading the model in Transformers can be quite slow, especially with CUDA devices(30m-1hours). Consider using an alternative serving framework.** However, we have not tested it on other frameworks due to limited cuda resources.
16
 
17
  Please follow the license of the original model.
18
 
@@ -20,7 +20,9 @@ Please follow the license of the original model.
20
 
21
  **INT4 Inference on CUDA**(**at least 7*80G**)
22
 
23
- On CUDA devices, the computation dtype is typically FP16 for int4 , which may lead to overflow for this model. While we have added a workaround to address this issue, we cannot guarantee reliable performance for all prompts. **For better stability, using CPU version is recommended. Please refer to the following section for details.**
 
 
24
 
25
  ~~~python
26
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
12
 
13
  This model is an int4 model with group_size 128 and symmetric quantization of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm.
14
 
15
+ **Loading the model in Transformers can be quite slow, especially with CUDA devices(30m-1hours). Consider using an alternative serving framework (some frameworks have overflow issues).** However, we have not tested it on other frameworks due to limited cuda resources.
16
 
17
  Please follow the license of the original model.
18
 
 
20
 
21
  **INT4 Inference on CUDA**(**at least 7*80G**)
22
 
23
+ On CUDA devices, the computation dtype is typically FP16 for int4 , which may lead to overflow for this model.
24
+ While we have added a workaround to address this issue, we cannot guarantee reliable performance for all prompts.
25
+ **For better stability, using CPU version is recommended. Please refer to the following section for details.**
26
 
27
  ~~~python
28
  from transformers import AutoModelForCausalLM, AutoTokenizer