TheBloke commited on
Commit
c0e4377
1 Parent(s): ca80b1b

Initial GGUF model commit

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -47,13 +47,14 @@ As of August 24th 2023, llama.cpp and KoboldCpp support GGUF. Other third-party
47
 
48
  Here is a list of clients and libraries that are known to support GGUF:
49
  * [llama.cpp](https://github.com/ggerganov/llama.cpp)
50
- * [KoboldCpp](https://github.com/LostRuins/koboldcpp), now supports GGUF as of release 1.41!
 
 
 
51
 
52
  Here is a list of clients and libraries, along with their expected timeline for GGUF support. Where possible a link to the relevant issue or PR is provided:
53
  * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), awaiting llama-cpp-python support.
54
  * [LM Studio](https://lmstudio.ai/), in active development - hoped to be ready by August 25th-26th.
55
- * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), will work as soon as ctransformers or llama-cpp-python is updated.
56
- * [ctransformers](https://github.com/marella/ctransformers), [development will start soon](https://github.com/marella/ctransformers/issues/102).
57
  * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [in active development](https://github.com/abetlen/llama-cpp-python/issues/628).
58
  <!-- README_GGUF.md-about-gguf end -->
59
 
@@ -125,13 +126,13 @@ Make sure you are using `llama.cpp` from commit [6381d4e110bd0ec02843a60bbeb8b6f
125
  For compatibility with older versions of llama.cpp, or for use with third-party clients and libaries, please use GGML files instead.
126
 
127
  ```
128
- ./main -t 10 -ngl 32 -m codellama-7b-python.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "prompt TBC"
129
  ```
130
  Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
131
 
132
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
133
 
134
- Change `-c 4096` to the desired sequence length for this model. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters should be set by llama.cpp automatically.
135
 
136
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
137
 
 
47
 
48
  Here is a list of clients and libraries that are known to support GGUF:
49
  * [llama.cpp](https://github.com/ggerganov/llama.cpp)
50
+ * [KoboldCpp](https://github.com/LostRuins/koboldcpp), now supports GGUF as of release 1.41! A powerful GGML web UI, with full GPU accel. Especially good for story telling.
51
+ * [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui), should now work, choose the `c_transformers` backend. A great web UI with many interesting features. Supports CUDA GPU acceleration.
52
+ * [ctransformers](https://github.com/marella/ctransformers), now supports GGUF as of version 0.2.24! A Python library with GPU accel, LangChain support, and OpenAI-compatible AI server.
53
+ * [candle](https://github.com/huggingface/candle), added GGUF support on August 22nd. Candle is a Rust ML framework with a focus on performance, including GPU support, and ease of use.
54
 
55
  Here is a list of clients and libraries, along with their expected timeline for GGUF support. Where possible a link to the relevant issue or PR is provided:
56
  * [text-generation-webui](https://github.com/oobabooga/text-generation-webui), awaiting llama-cpp-python support.
57
  * [LM Studio](https://lmstudio.ai/), in active development - hoped to be ready by August 25th-26th.
 
 
58
  * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [in active development](https://github.com/abetlen/llama-cpp-python/issues/628).
59
  <!-- README_GGUF.md-about-gguf end -->
60
 
 
126
  For compatibility with older versions of llama.cpp, or for use with third-party clients and libaries, please use GGML files instead.
127
 
128
  ```
129
+ ./main -t 10 -ngl 32 -m codellama-7b-python.q4_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"
130
  ```
131
  Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
132
 
133
  Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
134
 
135
+ Change `-c 4096` to the desired sequence length for this model. For extended sequence models - eg 8K, 16K, 32K - the necessary RoPE scaling parameters are read from the GGUF file and set by llama.cpp automatically.
136
 
137
  If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
138