lrl-modelcloud commited on
Commit
dd26e3d
1 Parent(s): 1090b5b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -17
README.md CHANGED
@@ -1,22 +1,22 @@
1
- This model has been quantized using [GPTQModel](https://github.com/ModelCloud/GPTQModel).
2
-
3
- - **bits**: 4
4
- - **group_size**: 128
5
- - **desc_act**: true
6
- - **static_groups**: false
7
- - **sym**: true
8
- - **lm_head**: false
9
- - **damp_percent**: 0.01
10
- - **true_sequential**: true
11
- - **model_name_or_path**: ""
12
- - **model_file_base_name**: "model"
13
- - **quant_method**: "gptq"
14
- - **checkpoint_format**: "gptq"
15
- - **meta**:
16
- - **quantizer**: "gptqmodel:0.9.9-dev0"
17
 
18
 
19
- Currently, only vllm can load the quantized gemma2-27b for proper inference. Here is an example:
20
  ```python
21
  import os
22
  # Gemma-2 use Flashinfer backend for models with logits_soft_cap. Otherwise, the output might be wrong.
 
1
+ **This model has been quantized using [GPTQModel](https://github.com/ModelCloud/GPTQModel).**
2
+
3
+ - bits: 4
4
+ - group_size: 128
5
+ - desc_act: true
6
+ - static_groups: false
7
+ - sym: true
8
+ - lm_head: false
9
+ - damp_percent: 0.01
10
+ - true_sequential: true
11
+ - model_name_or_path: ""
12
+ - model_file_base_name: "model"
13
+ - quant_method: "gptq"
14
+ - checkpoint_format: "gptq"
15
+ - meta
16
+ - quantizer: "gptqmodel:0.9.9-dev0"
17
 
18
 
19
+ **Currently, only vllm can load the quantized gemma2-27b for proper inference. Here is an example:**
20
  ```python
21
  import os
22
  # Gemma-2 use Flashinfer backend for models with logits_soft_cap. Otherwise, the output might be wrong.