TheBloke commited on
Commit
81e2785
1 Parent(s): 74ae984

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -5,12 +5,14 @@ inference: false
5
 
6
  # OpenAssistant LLaMA 30B SFT 7 GPTQ
7
 
8
- This in a repo of GGML format models for [OpenAssistant's LLaMA 30B SFT 7](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor).
9
 
10
- It is the result of merging the XORs from the above repo with the original Llama 30B weights, and then quantising to 4bit and 5bit GGML for CPU inference using [llama.cpp](https://github.com/ggerganov/llama.cpp).
11
 
12
  This is epoch 7 of OpenAssistant's training of their Llama 30B model.
13
 
 
 
14
  ## Repositories available
15
 
16
  * [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ).
 
5
 
6
  # OpenAssistant LLaMA 30B SFT 7 GPTQ
7
 
8
+ This in a repo of GPTQ format 4bit quantised models for [OpenAssistant's LLaMA 30B SFT 7](https://huggingface.co/OpenAssistant/oasst-sft-7-llama-30b-xor).
9
 
10
+ It is the result of merging the XORs from the above repo with the original Llama 30B weights, and then quantising to 4bit GPU inference using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
11
 
12
  This is epoch 7 of OpenAssistant's training of their Llama 30B model.
13
 
14
+ **Please note that these models will need 24GB VRAM or greater to use effectively**
15
+
16
  ## Repositories available
17
 
18
  * [4bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GPTQ).