nm-research
commited on
Commit
•
9e0c527
1
Parent(s):
2eefe8c
Update README.md
Browse files
README.md
CHANGED
@@ -31,7 +31,7 @@ It achieves an average score of 43.93 on the [OpenLLM](https://huggingface.co/sp
|
|
31 |
|
32 |
### Model Optimizations
|
33 |
|
34 |
-
This model was obtained by quantizing the weights of [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) to INT8 data type.
|
35 |
This optimization reduces the number of bits used to represent weights and activations from 16 to 8, reducing GPU memory requirements (by approximately 50%) and increasing matrix-multiply compute throughput (by approximately 2x).
|
36 |
Weight quantization also reduces disk size requirements by approximately 50%.
|
37 |
|
|
|
31 |
|
32 |
### Model Optimizations
|
33 |
|
34 |
+
This model was obtained by quantizing the weights and activations of [Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) to INT8 data type.
|
35 |
This optimization reduces the number of bits used to represent weights and activations from 16 to 8, reducing GPU memory requirements (by approximately 50%) and increasing matrix-multiply compute throughput (by approximately 2x).
|
36 |
Weight quantization also reduces disk size requirements by approximately 50%.
|
37 |
|