TheBloke
/

OpenAssistant-SFT-7-Llama-30B-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Apr 29, 2023

Commit

81e2785

•

1 Parent(s): 74ae984

Update README.md

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -5,12 +5,14 @@ inference: false
 # OpenAssistant LLaMA 30B SFT 7 GPTQ
-This in a repo of GGML format models for [OpenAssistant's LLaMA 30B SFT 7](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor).
-It is the result of merging the XORs from the above repo with the original Llama 30B weights, and then quantising to 4bit and 5bit GGML for CPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp).
 This is epoch 7 of OpenAssistant's training of their Llama 30B model.
 ## Repositories available
 * [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ).

 # OpenAssistant LLaMA 30B SFT 7 GPTQ
+This in a repo of GPTQ format 4bit quantised models for [OpenAssistant's LLaMA 30B SFT 7](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor).
+It is the result of merging the XORs from the above repo with the original Llama 30B weights, and then quantising to 4bit GPU inference using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
 This is epoch 7 of OpenAssistant's training of their Llama 30B model.
+**Please note that these models will need 24GB VRAM or greater to use effectively**
 ## Repositories available
 * [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ).