mgoin commited on
Commit
dc49055
1 Parent(s): 169e798

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -24
README.md CHANGED
@@ -1,24 +1,26 @@
1
- ---
2
- license: apache-2.0
3
- ---
4
- # Mistral-7B-Instruct-v0.3 quantized to 4bits
5
-
6
- - weight-only quantization via GPTQ to 4bits
7
- - GPTQ optimized for 99.75% accuracy recovery relative to the unquantized model
8
-
9
- # Open LLM Leaderboard evaluation scores
10
- | | Mistral-7B-Instruct-v0.3 | Mistral-7B-Instruct-v0.3-GPTQ-4bit<br>(this model) |
11
- | :------------------: | :----------------------: | :------------------------------------------------: |
12
- | arc-c<br>25-shot | 63.48 | 63.40 |
13
- | mmlu<br>5-shot | 61.13 | 60.89 |
14
- | hellaswag<br>10-shot | 84.49 | 84.04 |
15
- | winogrande<br>5-shot | 79.16 | 79.08 |
16
- | gsm8k<br>5-shot | 43.37 | 45.41 |
17
- | truthfulqa<br>0-shot | 59.65 | 57.48 |
18
- | **Average<br>Accuracy** | **65.21** | **65.05** |
19
- | **Recovery** | **100%** | **99.75%** |
20
-
21
- # nm-vllm inference performance
22
- - https://github.com/neuralmagic/nm-vllm
23
-
24
- ![](nm-vllm_Llama_vs_Mistral.png)
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Mistral-7B-Instruct-v0.3
4
+ ---
5
+ # [Mistral-7B-Instruct-v0.3](mistralai/Mistral-7B-Instruct-v0.3) quantized to 4bits
6
+
7
+ - weight-only quantization via GPTQ to 4bits with group_size=128
8
+ - GPTQ optimized for 99.75% accuracy recovery relative to the unquantized model
9
+
10
+ # Open LLM Leaderboard evaluation scores
11
+ | | Mistral-7B-Instruct-v0.3 | Mistral-7B-Instruct-v0.3-GPTQ-4bit<br>(this model) |
12
+ | :------------------: | :----------------------: | :------------------------------------------------: |
13
+ | arc-c<br>25-shot | 63.48 | 63.40 |
14
+ | mmlu<br>5-shot | 61.13 | 60.89 |
15
+ | hellaswag<br>10-shot | 84.49 | 84.04 |
16
+ | winogrande<br>5-shot | 79.16 | 79.08 |
17
+ | gsm8k<br>5-shot | 43.37 | 45.41 |
18
+ | truthfulqa<br>0-shot | 59.65 | 57.48 |
19
+ | **Average<br>Accuracy** | **65.21** | **65.05** |
20
+ | **Recovery** | **100%** | **99.75%** |
21
+
22
+ # vLLM Inference Performance
23
+
24
+ This model is ready for optimized inference using the Marlin mixed-precision kernels in vLLM: https://github.com/vllm-project/vllm
25
+
26
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/60466e4b4f40b01b66151416/3bX2Hqj4LaJxFhPHRucAn.png)