Docs: fix example filenames

#3
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -103,7 +103,7 @@ The following clients/libraries will automatically download models for you, prov
103
 
104
  ### In `text-generation-webui`
105
 
106
- Under Download Model, you can enter the model repo: [MaziyarPanahi/gemma-7b-GGUF](https://huggingface.co/MaziyarPanahi/gemma-7b-GGUF) and below it, a specific filename to download, such as: gemma-7b-GGUF.Q4_K_M.gguf.
107
 
108
  Then click Download.
109
 
@@ -118,7 +118,7 @@ pip3 install huggingface-hub
118
  Then you can download any individual model file to the current directory, at high speed, with a command like this:
119
 
120
  ```shell
121
- huggingface-cli download MaziyarPanahi/gemma-7b-GGUF gemma-7b-GGUF.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
122
  ```
123
  </details>
124
  <details>
@@ -141,7 +141,7 @@ pip3 install hf_transfer
141
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
142
 
143
  ```shell
144
- HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download MaziyarPanahi/gemma-7b-GGUF gemma-7b-GGUF.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
145
  ```
146
 
147
  Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
@@ -152,7 +152,7 @@ Windows Command Line users: You can set the environment variable by running `set
152
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
153
 
154
  ```shell
155
- ./main -ngl 35 -m gemma-7b-GGUF.Q4_K_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system
156
  {system_message}<|im_end|>
157
  <|im_start|>user
158
  {prompt}<|im_end|>
@@ -209,7 +209,7 @@ from llama_cpp import Llama
209
 
210
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
211
  llm = Llama(
212
- model_path="./gemma-7b-GGUF.Q4_K_M.gguf", # Download the model file first
213
  n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
214
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
215
  n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
@@ -229,7 +229,7 @@ output = llm(
229
 
230
  # Chat Completion API
231
 
232
- llm = Llama(model_path="./gemma-7b-GGUF.Q4_K_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
233
  llm.create_chat_completion(
234
  messages = [
235
  {"role": "system", "content": "You are a story writing assistant."},
 
103
 
104
  ### In `text-generation-webui`
105
 
106
+ Under Download Model, you can enter the model repo: [MaziyarPanahi/gemma-7b-GGUF](https://huggingface.co/MaziyarPanahi/gemma-7b-GGUF) and below it, a specific filename to download, such as: gemma-7b.Q4_K_M.gguf.
107
 
108
  Then click Download.
109
 
 
118
  Then you can download any individual model file to the current directory, at high speed, with a command like this:
119
 
120
  ```shell
121
+ huggingface-cli download MaziyarPanahi/gemma-7b-GGUF gemma-7b.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
122
  ```
123
  </details>
124
  <details>
 
141
  And set environment variable `HF_HUB_ENABLE_HF_TRANSFER` to `1`:
142
 
143
  ```shell
144
+ HF_HUB_ENABLE_HF_TRANSFER=1 huggingface-cli download MaziyarPanahi/gemma-7b-GGUF gemma-7b.Q4_K_M.gguf --local-dir . --local-dir-use-symlinks False
145
  ```
146
 
147
  Windows Command Line users: You can set the environment variable by running `set HF_HUB_ENABLE_HF_TRANSFER=1` before the download command.
 
152
  Make sure you are using `llama.cpp` from commit [d0cee0d](https://github.com/ggerganov/llama.cpp/commit/d0cee0d36d5be95a0d9088b674dbb27354107221) or later.
153
 
154
  ```shell
155
+ ./main -ngl 35 -m gemma-7b.Q4_K_M.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "<|im_start|>system
156
  {system_message}<|im_end|>
157
  <|im_start|>user
158
  {prompt}<|im_end|>
 
209
 
210
  # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
211
  llm = Llama(
212
+ model_path="./gemma-7b.Q4_K_M.gguf", # Download the model file first
213
  n_ctx=32768, # The max sequence length to use - note that longer sequence lengths require much more resources
214
  n_threads=8, # The number of CPU threads to use, tailor to your system and the resulting performance
215
  n_gpu_layers=35 # The number of layers to offload to GPU, if you have GPU acceleration available
 
229
 
230
  # Chat Completion API
231
 
232
+ llm = Llama(model_path="./gemma-7b.Q4_K_M.gguf", chat_format="llama-2") # Set chat_format according to the model you are using
233
  llm.create_chat_completion(
234
  messages = [
235
  {"role": "system", "content": "You are a story writing assistant."},