Update README.md
Browse files
README.md
CHANGED
@@ -32,7 +32,7 @@ pipeline_tag: text-generation
|
|
32 |
|
33 |
## Description
|
34 |
|
35 |
-
This repo contains GPTQ model files for [Stability AI's
|
36 |
|
37 |
Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
|
38 |
|
@@ -76,10 +76,10 @@ Each separate quant is in a different branch. See below for instructions on fet
|
|
76 |
|
77 |
## How to download from branches
|
78 |
|
79 |
-
- In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/
|
80 |
- With Git, you can clone a branch with:
|
81 |
```
|
82 |
-
git clone --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/
|
83 |
```
|
84 |
- In Python Transformers code, the branch is the `revision` parameter; see below.
|
85 |
|
@@ -90,13 +90,13 @@ Please make sure you're using the latest version of [text-generation-webui](http
|
|
90 |
It is strongly recommended to use the text-generation-webui one-click-installers unless you know how to make a manual install.
|
91 |
|
92 |
1. Click the **Model tab**.
|
93 |
-
2. Under **Download custom model or LoRA**, enter `TheBloke/
|
94 |
-
- To download from a specific branch, enter for example `TheBloke/
|
95 |
- see Provided Files above for the list of branches for each option.
|
96 |
3. Click **Download**.
|
97 |
4. The model will start downloading. Once it's finished it will say "Done"
|
98 |
5. In the top left, click the refresh icon next to **Model**.
|
99 |
-
6. In the **Model** dropdown, choose the model you just downloaded: `
|
100 |
7. The model will automatically load, and is now ready for use!
|
101 |
8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
|
102 |
* Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
|
@@ -104,7 +104,7 @@ It is strongly recommended to use the text-generation-webui one-click-installers
|
|
104 |
|
105 |
## How to use this GPTQ model from Python code
|
106 |
|
107 |
-
First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) installed:
|
108 |
|
109 |
`GITHUB_ACTIONS=true pip install auto-gptq`
|
110 |
|
@@ -114,7 +114,7 @@ Then try the following example code:
|
|
114 |
from transformers import AutoTokenizer, pipeline, logging
|
115 |
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
|
116 |
|
117 |
-
model_name_or_path = "TheBloke/
|
118 |
model_basename = "gptq_model-4bit--1g"
|
119 |
|
120 |
use_triton = False
|
@@ -123,7 +123,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
|
|
123 |
|
124 |
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
125 |
model_basename=model_basename,
|
126 |
-
inject_fused_attention=False, # Required for
|
127 |
use_safetensors=True,
|
128 |
trust_remote_code=False,
|
129 |
device="cuda:0",
|
|
|
32 |
|
33 |
## Description
|
34 |
|
35 |
+
This repo contains GPTQ model files for [Stability AI's StableBeluga 2](https://huggingface.co/stabilityai/StableBeluga2).
|
36 |
|
37 |
Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
|
38 |
|
|
|
76 |
|
77 |
## How to download from branches
|
78 |
|
79 |
+
- In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/StableBeluga2-GPTQ:gptq-4bit-32g-actorder_True`
|
80 |
- With Git, you can clone a branch with:
|
81 |
```
|
82 |
+
git clone --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/StableBeluga2-GPTQ`
|
83 |
```
|
84 |
- In Python Transformers code, the branch is the `revision` parameter; see below.
|
85 |
|
|
|
90 |
It is strongly recommended to use the text-generation-webui one-click-installers unless you know how to make a manual install.
|
91 |
|
92 |
1. Click the **Model tab**.
|
93 |
+
2. Under **Download custom model or LoRA**, enter `TheBloke/StableBeluga2-GPTQ`.
|
94 |
+
- To download from a specific branch, enter for example `TheBloke/StableBeluga2-GPTQ:gptq-4bit-32g-actorder_True`
|
95 |
- see Provided Files above for the list of branches for each option.
|
96 |
3. Click **Download**.
|
97 |
4. The model will start downloading. Once it's finished it will say "Done"
|
98 |
5. In the top left, click the refresh icon next to **Model**.
|
99 |
+
6. In the **Model** dropdown, choose the model you just downloaded: `StableBeluga2-GPTQ`
|
100 |
7. The model will automatically load, and is now ready for use!
|
101 |
8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
|
102 |
* Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
|
|
|
104 |
|
105 |
## How to use this GPTQ model from Python code
|
106 |
|
107 |
+
First make sure you have [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) 0.3.2 or later installed:
|
108 |
|
109 |
`GITHUB_ACTIONS=true pip install auto-gptq`
|
110 |
|
|
|
114 |
from transformers import AutoTokenizer, pipeline, logging
|
115 |
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
|
116 |
|
117 |
+
model_name_or_path = "TheBloke/StableBeluga2-GPTQ"
|
118 |
model_basename = "gptq_model-4bit--1g"
|
119 |
|
120 |
use_triton = False
|
|
|
123 |
|
124 |
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
125 |
model_basename=model_basename,
|
126 |
+
inject_fused_attention=False, # Required for Llama 2 70B models at this time.
|
127 |
use_safetensors=True,
|
128 |
trust_remote_code=False,
|
129 |
device="cuda:0",
|