update Readme q4 to Q4
#4
by
borisalmonacid
- opened
README.md
CHANGED
@@ -189,7 +189,7 @@ The following clients/libraries will automatically download models for you, prov
|
|
189 |
|
190 |
### In `text-generation-webui`
|
191 |
|
192 |
-
Under Download Model, you can enter the model repo: TheBloke/Llama-2-70B-chat-GGUF and below it, a specific filename to download, such as: llama-2-70b-chat.
|
193 |
|
194 |
Then click Download.
|
195 |
|
@@ -204,7 +204,7 @@ pip3 install huggingface-hub>=0.17.1
|
|
204 |
Then you can download any individual model file to the current directory, at high speed, with a command like this:
|
205 |
|
206 |
```shell
|
207 |
-
huggingface-cli download TheBloke/Llama-2-70B-chat-GGUF llama-2-70b-chat.
|
208 |
```
|
209 |
|
210 |
<details>
|
@@ -227,7 +227,7 @@ pip3 install hf_transfer
|
|
227 |
And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
|
228 |
|
229 |
```shell
|
230 |
-
HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/Llama-2-70B-chat-GGUF llama-2-70b-chat.
|
231 |
```
|
232 |
|
233 |
Windows CLI users: Use `set HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1` before running the download command.
|
@@ -240,7 +240,7 @@ Windows CLI users: Use `set HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1` before running
|
|
240 |
Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
|
241 |
|
242 |
```shell
|
243 |
-
./main -ngl 32 -m llama-2-70b-chat.
|
244 |
```
|
245 |
|
246 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
|
|
189 |
|
190 |
### In `text-generation-webui`
|
191 |
|
192 |
+
Under Download Model, you can enter the model repo: TheBloke/Llama-2-70B-chat-GGUF and below it, a specific filename to download, such as: llama-2-70b-chat.Q4_K_M.gguf.
|
193 |
|
194 |
Then click Download.
|
195 |
|
|
|
204 |
Then you can download any individual model file to the current directory, at high speed, with a command like this:
|
205 |
|
206 |
```shell
|
207 |
+
huggingface-cli download TheBloke/Llama-2-70B-chat-GGUF llama-2-70b-chat.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
|
208 |
```
|
209 |
|
210 |
<details>
|
|
|
227 |
And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
|
228 |
|
229 |
```shell
|
230 |
+
HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download TheBloke/Llama-2-70B-chat-GGUF llama-2-70b-chat.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
|
231 |
```
|
232 |
|
233 |
Windows CLI users: Use `set HUGGINGFACE_HUB_ENABLE_HF_TRANSFER=1` before running the download command.
|
|
|
240 |
Make sure you are using `llama.cpp` from commit [d0cee0d36d5be95a0d9088b674dbb27354107221](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
|
241 |
|
242 |
```shell
|
243 |
+
./main -ngl 32 -m llama-2-70b-chat.Q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "[INST] <<SYS>>\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\n<</SYS>>\n{prompt}[/INST]"
|
244 |
```
|
245 |
|
246 |
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|