Update README.md
Browse files
README.md
CHANGED
@@ -58,11 +58,23 @@ I believe you will need 2 x 80GB GPUs (or 4 x 48GB) to load the 4-bit models, an
|
|
58 |
|
59 |
Assuming the quants finish OK (and if you're reading this message, they did!) I will test them during the day on 7th September and update this notice with my findings.
|
60 |
|
61 |
-
##
|
62 |
|
63 |
-
|
64 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
<!-- description end -->
|
|
|
66 |
<!-- repositories-available start -->
|
67 |
## Repositories available
|
68 |
|
@@ -180,7 +192,6 @@ model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"
|
|
180 |
# To use a different branch, change revision
|
181 |
# For example: revision="gptq-3bit--1g-actorder_True"
|
182 |
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
|
183 |
-
torch_dtype=torch.bfloat16,
|
184 |
device_map="auto",
|
185 |
revision="main")
|
186 |
|
@@ -218,11 +229,9 @@ print(pipe(prompt_template)[0]['generated_text'])
|
|
218 |
<!-- README_GPTQ.md-compatibility start -->
|
219 |
## Compatibility
|
220 |
|
221 |
-
The files provided
|
222 |
-
|
223 |
-
[ExLlama](https://github.com/turboderp/exllama) is compatible with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
|
224 |
|
225 |
-
[Huggingface Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) is compatible with all GPTQ models.
|
226 |
<!-- README_GPTQ.md-compatibility end -->
|
227 |
|
228 |
<!-- footer start -->
|
|
|
58 |
|
59 |
Assuming the quants finish OK (and if you're reading this message, they did!) I will test them during the day on 7th September and update this notice with my findings.
|
60 |
|
61 |
+
## SPLIT FILES
|
62 |
|
63 |
+
Due to the HF 50GB file limit, and the fact that GPTQ does not currently support sharding, I have had to split the `model.safetensors` file.
|
64 |
|
65 |
+
To join it:
|
66 |
+
|
67 |
+
Linux and macOS:
|
68 |
+
```
|
69 |
+
cat model.safetensors-split-* > model.safetensors && rm model.safetensors-split-*
|
70 |
+
```
|
71 |
+
Windows command line:
|
72 |
+
```
|
73 |
+
COPY /B model.safetensors.split-a + model.safetensors.split-b model.safetensors
|
74 |
+
del model.safetensors.split-a model.safetensors.split-b
|
75 |
+
```
|
76 |
<!-- description end -->
|
77 |
+
|
78 |
<!-- repositories-available start -->
|
79 |
## Repositories available
|
80 |
|
|
|
192 |
# To use a different branch, change revision
|
193 |
# For example: revision="gptq-3bit--1g-actorder_True"
|
194 |
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
|
|
|
195 |
device_map="auto",
|
196 |
revision="main")
|
197 |
|
|
|
229 |
<!-- README_GPTQ.md-compatibility start -->
|
230 |
## Compatibility
|
231 |
|
232 |
+
The files provided have not yet been tested.
|
|
|
|
|
233 |
|
234 |
+
[Huggingface Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) is compatible with all GPTQ models, but hasn't yet been tested with these files.
|
235 |
<!-- README_GPTQ.md-compatibility end -->
|
236 |
|
237 |
<!-- footer start -->
|