TheBloke
/

Falcon-180B-Chat-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Sep 7, 2023

Commit

08bfd47

•

1 Parent(s): fe7855a

Update README.md

Files changed (1) hide show

README.md +16 -7

README.md CHANGED Viewed

@@ -58,11 +58,23 @@ I believe you will need 2 x 80GB GPUs (or 4 x 48GB) to load the 4-bit models, an
 Assuming the quants finish OK (and if you're reading this message, they did!) I will test them during the day on 7th September and update this notice with my findings.
-## GGUFs
-It should soon be possible to make llama.cpp GGUFs for Falcon 180B models. Currently that is awaiting changes to the conversion script. Once that has been done, I will try making and uploading GGUFs.
 <!-- description end -->
 <!-- repositories-available start -->
 ## Repositories available
@@ -180,7 +192,6 @@ model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"
 # To use a different branch, change revision
 # For example: revision="gptq-3bit--1g-actorder_True"
 model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
-                                             torch_dtype=torch.bfloat16,
                                              device_map="auto",
                                              revision="main")
@@ -218,11 +229,9 @@ print(pipe(prompt_template)[0]['generated_text'])
 <!-- README_GPTQ.md-compatibility start -->
 ## Compatibility
-The files provided are tested to work with AutoGPTQ, both via Transformers and using AutoGPTQ directly. They should also work with [Occ4m's GPTQ-for-LLaMa fork](https://github.com/0cc4m/KoboldAI).
-[ExLlama](https://github.com/turboderp/exllama) is compatible with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.
-[Huggingface Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) is compatible with all GPTQ models.
 <!-- README_GPTQ.md-compatibility end -->
 <!-- footer start -->

 Assuming the quants finish OK (and if you're reading this message, they did!) I will test them during the day on 7th September and update this notice with my findings.
+## SPLIT FILES
+Due to the HF 50GB file limit, and the fact that GPTQ does not currently support sharding, I have had to split the `model.safetensors` file.
+To join it:
+Linux and macOS:
+```
+cat model.safetensors-split-* > model.safetensors && rm model.safetensors-split-*
+```
+Windows command line:
+```
+COPY /B model.safetensors.split-a + model.safetensors.split-b model.safetensors
+del model.safetensors.split-a model.safetensors.split-b
+```
 <!-- description end -->
 <!-- repositories-available start -->
 ## Repositories available
 # To use a different branch, change revision
 # For example: revision="gptq-3bit--1g-actorder_True"
 model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
                                              device_map="auto",
                                              revision="main")
 <!-- README_GPTQ.md-compatibility start -->
 ## Compatibility
+The files provided have not yet been tested.
+[Huggingface Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) is compatible with all GPTQ models, but hasn't yet been tested with these files.
 <!-- README_GPTQ.md-compatibility end -->
 <!-- footer start -->