ModelCloud
/

gemma-2-27b-it-gptq-4bit

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

lrl-modelcloud commited on Jul 22

Commit

dd26e3d

•

1 Parent(s): 1090b5b

Update README.md

Files changed (1) hide show

README.md +17 -17

README.md CHANGED Viewed

@@ -1,22 +1,22 @@
-This model has been quantized using [GPTQModel](https://github.com/ModelCloud/GPTQModel).
-- **bits**: 4
-- **group_size**: 128
-- **desc_act**: true
-- **static_groups**: false
-- **sym**: true
-- **lm_head**: false
-- **damp_percent**: 0.01
-- **true_sequential**: true
-- **model_name_or_path**: ""
-- **model_file_base_name**: "model"
-- **quant_method**: "gptq"
-- **checkpoint_format**: "gptq"
-- **meta**：
-  - **quantizer**: "gptqmodel:0.9.9-dev0"
-Currently, only vllm can load the quantized gemma2-27b for proper inference. Here is an example:
 ```python
 import os
 # Gemma-2 use Flashinfer backend for models with logits_soft_cap. Otherwise, the output might be wrong.

+**This model has been quantized using [GPTQModel](https://github.com/ModelCloud/GPTQModel).**
+- bits: 4
+- group_size: 128
+- desc_act: true
+- static_groups: false
+- sym: true
+- lm_head: false
+- damp_percent: 0.01
+- true_sequential: true
+- model_name_or_path: ""
+- model_file_base_name: "model"
+- quant_method: "gptq"
+- checkpoint_format: "gptq"
+- meta：
+  - quantizer: "gptqmodel:0.9.9-dev0"
+**Currently, only vllm can load the quantized gemma2-27b for proper inference. Here is an example:**
 ```python
 import os
 # Gemma-2 use Flashinfer backend for models with logits_soft_cap. Otherwise, the output might be wrong.