Update README.md
Browse files
README.md
CHANGED
@@ -5,12 +5,14 @@ inference: false
|
|
5 |
|
6 |
# OpenAssistant LLaMA 30B SFT 7 GPTQ
|
7 |
|
8 |
-
This in a repo of
|
9 |
|
10 |
-
It is the result of merging the XORs from the above repo with the original Llama 30B weights, and then quantising to 4bit
|
11 |
|
12 |
This is epoch 7 of OpenAssistant's training of their Llama 30B model.
|
13 |
|
|
|
|
|
14 |
## Repositories available
|
15 |
|
16 |
* [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ).
|
|
|
5 |
|
6 |
# OpenAssistant LLaMA 30B SFT 7 GPTQ
|
7 |
|
8 |
+
This in a repo of GPTQ format 4bit quantised models for [OpenAssistant's LLaMA 30B SFT 7](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor).
|
9 |
|
10 |
+
It is the result of merging the XORs from the above repo with the original Llama 30B weights, and then quantising to 4bit GPU inference using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
|
11 |
|
12 |
This is epoch 7 of OpenAssistant's training of their Llama 30B model.
|
13 |
|
14 |
+
**Please note that these models will need 24GB VRAM or greater to use effectively**
|
15 |
+
|
16 |
## Repositories available
|
17 |
|
18 |
* [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ).
|