TheBloke commited on
Commit
5220cd8
1 Parent(s): 5f7f138

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -3
README.md CHANGED
@@ -14,6 +14,15 @@ It is the result of merging the deltas from the above repository with the origin
14
  * [4bit and 5bit GGML models for CPU inference](https://huggingface.co/TheBloke/stable-vicuna-13B-GGML).
15
  * [Unquantised 16bit model in HF format](https://huggingface.co/TheBloke/stable-vicuna-13B-HF).
16
 
 
 
 
 
 
 
 
 
 
17
  ## Provided files
18
  | Name | Quant method | Bits | Size | RAM required | Use case |
19
  | ---- | ---- | ---- | ---- | ---- | ----- |
@@ -50,15 +59,20 @@ Don't expect any third-party UIs/tools to support them yet.
50
  I use the following command line; adjust for your tastes and needs:
51
 
52
  ```
53
- ./main -t 18 -m stable-vicuna-13B.ggml.q4_2.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Human: Write a story about llamas
54
- ### Assistant:"
55
  ```
56
  Change `-t 18` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
57
 
58
- If you want to have a chat-style conversation, replace the `-p <PROMPT>` argument with `-i -ins`
 
 
 
 
59
 
60
  ## How to run in `text-generation-webui`
61
 
 
 
62
  Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
63
 
64
  Note: at this time text-generation-webui will not support the new q5 quantisation methods.
 
14
  * [4bit and 5bit GGML models for CPU inference](https://huggingface.co/TheBloke/stable-vicuna-13B-GGML).
15
  * [Unquantised 16bit model in HF format](https://huggingface.co/TheBloke/stable-vicuna-13B-HF).
16
 
17
+ ## PROMPT TEMPLATE
18
+
19
+ This model works best with the following prompt template:
20
+
21
+ ```
22
+ ### Human: your prompt here
23
+ ### Assistant:
24
+ ```
25
+
26
  ## Provided files
27
  | Name | Quant method | Bits | Size | RAM required | Use case |
28
  | ---- | ---- | ---- | ---- | ---- | ----- |
 
59
  I use the following command line; adjust for your tastes and needs:
60
 
61
  ```
62
+ ./main -t 18 -m stable-vicuna-13B.ggml.q4_2.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -r "### Human:" -i
 
63
  ```
64
  Change `-t 18` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
65
 
66
+ If you want to enter a prompt from the command line, use `-p <PROMPT>` like so:
67
+
68
+ ```
69
+ ./main -t 18 -m stable-vicuna-13B.ggml.q4_2.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -r "### Human:" -p "### Human: write a story about llamas ### Assistant:"
70
+ ```
71
 
72
  ## How to run in `text-generation-webui`
73
 
74
+ GGML models can be loaded into text-generation-webui by installing the llama.cpp module, then placing the ggml model file in a model folder as usual.
75
+
76
  Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
77
 
78
  Note: at this time text-generation-webui will not support the new q5 quantisation methods.