cicdatopea
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ base_model:
|
|
12 |
|
13 |
This model is an int4 model with group_size 128 and symmetric quantization of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm.
|
14 |
|
15 |
-
**Loading the model in Transformers can be quite slow, especially with CUDA devices(30m-1hours). Consider using an alternative serving framework.** However, we have not tested it on other frameworks due to limited cuda resources.
|
16 |
|
17 |
Please follow the license of the original model.
|
18 |
|
@@ -20,7 +20,9 @@ Please follow the license of the original model.
|
|
20 |
|
21 |
**INT4 Inference on CUDA**(**at least 7*80G**)
|
22 |
|
23 |
-
On CUDA devices, the computation dtype is typically FP16 for int4 , which may lead to overflow for this model.
|
|
|
|
|
24 |
|
25 |
~~~python
|
26 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
12 |
|
13 |
This model is an int4 model with group_size 128 and symmetric quantization of [deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) generated by [intel/auto-round](https://github.com/intel/auto-round) algorithm.
|
14 |
|
15 |
+
**Loading the model in Transformers can be quite slow, especially with CUDA devices(30m-1hours). Consider using an alternative serving framework (some frameworks have overflow issues).** However, we have not tested it on other frameworks due to limited cuda resources.
|
16 |
|
17 |
Please follow the license of the original model.
|
18 |
|
|
|
20 |
|
21 |
**INT4 Inference on CUDA**(**at least 7*80G**)
|
22 |
|
23 |
+
On CUDA devices, the computation dtype is typically FP16 for int4 , which may lead to overflow for this model.
|
24 |
+
While we have added a workaround to address this issue, we cannot guarantee reliable performance for all prompts.
|
25 |
+
**For better stability, using CPU version is recommended. Please refer to the following section for details.**
|
26 |
|
27 |
~~~python
|
28 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|