TheBloke commited on
Commit
08bfd47
1 Parent(s): fe7855a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -7
README.md CHANGED
@@ -58,11 +58,23 @@ I believe you will need 2 x 80GB GPUs (or 4 x 48GB) to load the 4-bit models, an
58
 
59
  Assuming the quants finish OK (and if you're reading this message, they did!) I will test them during the day on 7th September and update this notice with my findings.
60
 
61
- ## GGUFs
62
 
63
- It should soon be possible to make llama.cpp GGUFs for Falcon 180B models. Currently that is awaiting changes to the conversion script. Once that has been done, I will try making and uploading GGUFs.
64
 
 
 
 
 
 
 
 
 
 
 
 
65
  <!-- description end -->
 
66
  <!-- repositories-available start -->
67
  ## Repositories available
68
 
@@ -180,7 +192,6 @@ model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"
180
  # To use a different branch, change revision
181
  # For example: revision="gptq-3bit--1g-actorder_True"
182
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
183
- torch_dtype=torch.bfloat16,
184
  device_map="auto",
185
  revision="main")
186
 
@@ -218,11 +229,9 @@ print(pipe(prompt_template)[0]['generated_text'])
218
  <!-- README_GPTQ.md-compatibility start -->
219
  ## Compatibility
220
 
221
- The files provided are tested to work with AutoGPTQ, both via Transformers and using AutoGPTQ directly. They should also work with [Occ4m's GPTQ-for-LLaMa fork](https://github.com/0cc4m/KoboldAI).
222
-
223
- [ExLlama](https://github.com/turboderp/exllama) is compatible with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
224
 
225
- [Huggingface Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) is compatible with all GPTQ models.
226
  <!-- README_GPTQ.md-compatibility end -->
227
 
228
  <!-- footer start -->
 
58
 
59
  Assuming the quants finish OK (and if you're reading this message, they did!) I will test them during the day on 7th September and update this notice with my findings.
60
 
61
+ ## SPLIT FILES
62
 
63
+ Due to the HF 50GB file limit, and the fact that GPTQ does not currently support sharding, I have had to split the `model.safetensors` file.
64
 
65
+ To join it:
66
+
67
+ Linux and macOS:
68
+ ```
69
+ cat model.safetensors-split-* > model.safetensors && rm model.safetensors-split-*
70
+ ```
71
+ Windows command line:
72
+ ```
73
+ COPY /B model.safetensors.split-a + model.safetensors.split-b model.safetensors
74
+ del model.safetensors.split-a model.safetensors.split-b
75
+ ```
76
  <!-- description end -->
77
+
78
  <!-- repositories-available start -->
79
  ## Repositories available
80
 
 
192
  # To use a different branch, change revision
193
  # For example: revision="gptq-3bit--1g-actorder_True"
194
  model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
 
195
  device_map="auto",
196
  revision="main")
197
 
 
229
  <!-- README_GPTQ.md-compatibility start -->
230
  ## Compatibility
231
 
232
+ The files provided have not yet been tested.
 
 
233
 
234
+ [Huggingface Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) is compatible with all GPTQ models, but hasn't yet been tested with these files.
235
  <!-- README_GPTQ.md-compatibility end -->
236
 
237
  <!-- footer start -->