Update README.md
Browse files
README.md
CHANGED
@@ -30,6 +30,16 @@ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/gger
|
|
30 |
* [ctransformers](https://github.com/marella/ctransformers), a Python library with LangChain support and OpenAI-compatible AI server.
|
31 |
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with OpenAI-compatible API server.
|
32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
|
34 |
## Repositories available
|
35 |
|
@@ -37,7 +47,7 @@ GGML files are for CPU + GPU inference using [llama.cpp](https://github.com/gger
|
|
37 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/LLongMA-2-7B-GGML)
|
38 |
* [Original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/conceptofmind/LLongMA-2-7b)
|
39 |
|
40 |
-
## Prompt template:
|
41 |
|
42 |
```
|
43 |
{prompt}
|
@@ -97,7 +107,7 @@ Refer to the Provided Files table below to see what files use which methods, and
|
|
97 |
I use the following command line; adjust for your tastes and needs:
|
98 |
|
99 |
```
|
100 |
-
./main -t 10 -ngl 32 -m llongma-2-7b.ggmlv3.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "
|
101 |
```
|
102 |
Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
|
103 |
|
@@ -177,3 +187,5 @@ If you have any questions about the data or model be sure to reach out and ask!
|
|
177 |
The previous suite of LLongMA model releases can be found here: https://twitter.com/EnricoShippole/status/1677346578720256000?s=20
|
178 |
|
179 |
All of the models can be found on Huggingface: https://huggingface.co/conceptofmind
|
|
|
|
|
|
30 |
* [ctransformers](https://github.com/marella/ctransformers), a Python library with LangChain support and OpenAI-compatible AI server.
|
31 |
* [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), a Python library with OpenAI-compatible API server.
|
32 |
|
33 |
+
## Extended context
|
34 |
+
|
35 |
+
This is an extended context base Llama 2 model. Please check if your GGML client supports extended context. llama.cpp and KoboldCpp do, but I have not verified the others.
|
36 |
+
|
37 |
+
I believe the correct parameters for llama.cpp extended context are:
|
38 |
+
```
|
39 |
+
-c <contextsize> --rope-freq-base 10000 --rope-freq-scale 0.5"
|
40 |
+
```
|
41 |
+
|
42 |
+
I have tested these parameters and the answer is coherent, but I haven't yet confirmed if they're ideal. Please let me know in Discussions if you have feedback on that.
|
43 |
|
44 |
## Repositories available
|
45 |
|
|
|
47 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/LLongMA-2-7B-GGML)
|
48 |
* [Original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/conceptofmind/LLongMA-2-7b)
|
49 |
|
50 |
+
## Prompt template: None
|
51 |
|
52 |
```
|
53 |
{prompt}
|
|
|
107 |
I use the following command line; adjust for your tastes and needs:
|
108 |
|
109 |
```
|
110 |
+
./main -t 10 -ngl 32 -m llongma-2-7b.ggmlv3.q4_0.bin --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "Llamas are very"
|
111 |
```
|
112 |
Change `-t 10` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
|
113 |
|
|
|
187 |
The previous suite of LLongMA model releases can be found here: https://twitter.com/EnricoShippole/status/1677346578720256000?s=20
|
188 |
|
189 |
All of the models can be found on Huggingface: https://huggingface.co/conceptofmind
|
190 |
+
|
191 |
+
You can find the Llama-2 usage policy here: https://ai.meta.com/llama/use-policy/
|