Transformers
llama
TheBloke commited on
Commit
0e2462e
1 Parent(s): f6df90c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -2
README.md CHANGED
@@ -30,6 +30,16 @@ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/gger
30
  * [ctransformers](https://github.com/marella/ctransformers), a Python library with LangChain support and OpenAI-compatible AI server.
31
  * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with OpenAI-compatible API server.
32
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ## Repositories available
35
 
@@ -37,7 +47,7 @@ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/gger
37
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/LLongMA-2-7B-GGML)
38
  * [Original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/conceptofmind/LLongMA-2-7b)
39
 
40
- ## Prompt template: Unknown
41
 
42
  ```
43
  {prompt}
@@ -97,7 +107,7 @@ Refer to the Provided Files table below to see what files use which methods, and
97
  I use the following command line; adjust for your tastes and needs:
98
 
99
  ```
100
- ./main -t 10 -ngl 32 -m llongma-2-7b.ggmlv3.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "### Instruction: Write a story about llamas\n### Response:"
101
  ```
102
  Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
103
 
@@ -177,3 +187,5 @@ If you have any questions about the data or model be sure to reach out and ask!
177
  The previous suite of LLongMA model releases can be found here: https://twitter.com/EnricoShippole/status/1677346578720256000?s=20
178
 
179
  All of the models can be found on Huggingface: https://huggingface.co/conceptofmind
 
 
 
30
  * [ctransformers](https://github.com/marella/ctransformers), a Python library with LangChain support and OpenAI-compatible AI server.
31
  * [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with OpenAI-compatible API server.
32
 
33
+ ## Extended context
34
+
35
+ This is an extended context base Llama 2 model. Please check if your GGML client supports extended context. llama.cpp and KoboldCpp do, but I have not verified the others.
36
+
37
+ I believe the correct parameters for llama.cpp extended context are:
38
+ ```
39
+ -c <contextsize> --rope-freq-base 10000 --rope-freq-scale 0.5"
40
+ ```
41
+
42
+ I have tested these parameters and the answer is coherent, but I haven't yet confirmed if they're ideal. Please let me know in Discussions if you have feedback on that.
43
 
44
  ## Repositories available
45
 
 
47
  * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/LLongMA-2-7B-GGML)
48
  * [Original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/conceptofmind/LLongMA-2-7b)
49
 
50
+ ## Prompt template: None
51
 
52
  ```
53
  {prompt}
 
107
  I use the following command line; adjust for your tastes and needs:
108
 
109
  ```
110
+ ./main -t 10 -ngl 32 -m llongma-2-7b.ggmlv3.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Llamas are very"
111
  ```
112
  Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
113
 
 
187
  The previous suite of LLongMA model releases can be found here: https://twitter.com/EnricoShippole/status/1677346578720256000?s=20
188
 
189
  All of the models can be found on Huggingface: https://huggingface.co/conceptofmind
190
+
191
+ You can find the Llama-2 usage policy here: https://ai.meta.com/llama/use-policy/