Update README.md
Browse files
README.md
CHANGED
@@ -35,13 +35,22 @@ tags:
|
|
35 |
|
36 |
This repo contains GGML format model files for [Meta's Llama 2 70B Chat](https://huggingface.co/meta-llama/Llama-2-70b-chat).
|
37 |
|
38 |
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
45 |
|
46 |
## Repositories available
|
47 |
|
@@ -61,15 +70,11 @@ You are a helpful, respectful and honest assistant. Always answer as helpfully a
|
|
61 |
<!-- compatibility_ggml start -->
|
62 |
## Compatibility
|
63 |
|
64 |
-
###
|
65 |
|
66 |
-
|
67 |
|
68 |
-
|
69 |
-
|
70 |
-
These new quantisation methods are compatible with llama.cpp as of June 6th, commit `2d43387`.
|
71 |
-
|
72 |
-
They are now also compatible with recent releases of text-generation-webui, KoboldCpp, llama-cpp-python, ctransformers, rustformers and most others. For compatibility with other tools and libraries, please check their documentation.
|
73 |
|
74 |
## Explanation of the new k-quant methods
|
75 |
<details>
|
@@ -109,17 +114,11 @@ Refer to the Provided Files table below to see what files use which methods, and
|
|
109 |
I use the following command line; adjust for your tastes and needs:
|
110 |
|
111 |
```
|
112 |
-
./main -
|
113 |
```
|
114 |
-
Change `-t
|
115 |
-
|
116 |
-
Change `-ngl 32` to the number of layers to offload to GPU. Remove it if you don't have GPU acceleration.
|
117 |
|
118 |
-
|
119 |
-
|
120 |
-
## How to run in `text-generation-webui`
|
121 |
-
|
122 |
-
Further instructions here: [text-generation-webui/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md).
|
123 |
|
124 |
<!-- footer start -->
|
125 |
## Discord
|
@@ -145,7 +144,6 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
|
|
145 |
|
146 |
**Patreon special mentions**: Slarti, Chadd, John Detwiler, Pieter, zynix, K, Mano Prime, ReadyPlayerEmma, Ai Maven, Leonard Tan, Edmond Seymore, Joseph William Delisle, Luke @flexchar, Fred von Graf, Viktor Bowallius, Rishabh Srivastava, Nikolai Manek, Matthew Berman, Johann-Peter Hartmann, ya boyyy, Greatston Gnanesh, Femi Adebogun, Talal Aujan, Jonathan Leane, terasurfer, David Flickinger, William Sang, Ajan Kanaga, Vadim, Artur Olbinski, Raven Klaugh, Michael Levine, Oscar Rangel, Randy H, Cory Kujawski, RoA, Dave, Alex, Alexandros Triantafyllidis, Fen Risland, Eugene Pentland, vamX, Elle, Nathan LeClaire, Khalefa Al-Ahmad, Rainer Wilmers, subjectnull, Junyu Yang, Daniel P. Andersen, SuperWojo, LangChain4j, Mandus, Kalila, Illia Dulskyi, Trenton Dambrowitz, Asp the Wyvern, Derek Yates, Jeffrey Morgan, Deep Realms, Imad Khwaja, Pyrater, Preetika Verma, biorpg, Gabriel Tamborski, Stephen Murray, Spiking Neurons AB, Iucharbius, Chris Smitley, Willem Michiel, Luke Pendergrass, Sebastain Graf, senxiiz, Will Dee, Space Cruiser, Karl Bernard, Clay Pascal, Lone Striker, transmissions 11, webtim, WelcomeToTheClub, Sam, theTransient, Pierre Kircher, chris gileta, John Villwock, Sean Connelly, Willian Hasse
|
147 |
|
148 |
-
|
149 |
Thank you to all my generous patrons and donaters!
|
150 |
|
151 |
<!-- footer end -->
|
|
|
35 |
|
36 |
This repo contains GGML format model files for [Meta's Llama 2 70B Chat](https://huggingface.co/meta-llama/Llama-2-70b-chat).
|
37 |
|
38 |
+
## Only compatible with latest llama.cpp
|
39 |
+
|
40 |
+
To use these files you need:
|
41 |
+
|
42 |
+
1. llama.cpp as of [commit `e76d630`](https://github.com/ggerganov/llama.cpp/commit/e76d630df17e235e6b9ef416c45996765d2e36fb) or later.
|
43 |
+
- For users who don't want to compile from source, you can use the binaries from [release master-e76d630](https://github.com/ggerganov/llama.cpp/releases/tag/master-e76d630)
|
44 |
+
2. to add new command line parameter `-gqa 8`
|
45 |
+
|
46 |
+
Example command:
|
47 |
+
```
|
48 |
+
/workspace/git/llama.cpp/main -m llama-2-70b-chat/ggml/llama-2-70b-chat.ggmlv3.q4_0.bin -gqa 8 -t 13 -p "[INST] <<SYS>>You are a helpful assistant<</SYS>>Write a story about llamas[/INST]"
|
49 |
+
```
|
50 |
+
|
51 |
+
There is no CUDA support at this time, but it should hopefully be coming soon.
|
52 |
+
|
53 |
+
There is no support in third-party UIs or Python libraries (llama-cpp-python, ctransformers) yet. That will come in due course.
|
54 |
|
55 |
## Repositories available
|
56 |
|
|
|
70 |
<!-- compatibility_ggml start -->
|
71 |
## Compatibility
|
72 |
|
73 |
+
### Only compatible with llama.cpp as of commit `e76d630`
|
74 |
|
75 |
+
Compatible with llama.cpp as of [commit `e76d630`](https://github.com/ggerganov/llama.cpp/commit/e76d630df17e235e6b9ef416c45996765d2e36fb) or later.
|
76 |
|
77 |
+
For a pre-compiled release, use [release master-e76d630](https://github.com/ggerganov/llama.cpp/releases/tag/master-e76d630) or later.
|
|
|
|
|
|
|
|
|
78 |
|
79 |
## Explanation of the new k-quant methods
|
80 |
<details>
|
|
|
114 |
I use the following command line; adjust for your tastes and needs:
|
115 |
|
116 |
```
|
117 |
+
./main -m llama-2-70b-chat/ggml/llama-2-70b-chat.ggmlv3.q4_0.bin -gqa 8 -t 13 -p "[INST] <<SYS>>You are a helpful assistant<</SYS>>Write a story about llamas[/INST]"
|
118 |
```
|
119 |
+
Change `-t 13` to the number of physical CPU cores you have. For example if your system has 8 cores/16 threads, use `-t 8`.
|
|
|
|
|
120 |
|
121 |
+
No GPU support is possible yet, but it is coming soon.
|
|
|
|
|
|
|
|
|
122 |
|
123 |
<!-- footer start -->
|
124 |
## Discord
|
|
|
144 |
|
145 |
**Patreon special mentions**: Slarti, Chadd, John Detwiler, Pieter, zynix, K, Mano Prime, ReadyPlayerEmma, Ai Maven, Leonard Tan, Edmond Seymore, Joseph William Delisle, Luke @flexchar, Fred von Graf, Viktor Bowallius, Rishabh Srivastava, Nikolai Manek, Matthew Berman, Johann-Peter Hartmann, ya boyyy, Greatston Gnanesh, Femi Adebogun, Talal Aujan, Jonathan Leane, terasurfer, David Flickinger, William Sang, Ajan Kanaga, Vadim, Artur Olbinski, Raven Klaugh, Michael Levine, Oscar Rangel, Randy H, Cory Kujawski, RoA, Dave, Alex, Alexandros Triantafyllidis, Fen Risland, Eugene Pentland, vamX, Elle, Nathan LeClaire, Khalefa Al-Ahmad, Rainer Wilmers, subjectnull, Junyu Yang, Daniel P. Andersen, SuperWojo, LangChain4j, Mandus, Kalila, Illia Dulskyi, Trenton Dambrowitz, Asp the Wyvern, Derek Yates, Jeffrey Morgan, Deep Realms, Imad Khwaja, Pyrater, Preetika Verma, biorpg, Gabriel Tamborski, Stephen Murray, Spiking Neurons AB, Iucharbius, Chris Smitley, Willem Michiel, Luke Pendergrass, Sebastain Graf, senxiiz, Will Dee, Space Cruiser, Karl Bernard, Clay Pascal, Lone Striker, transmissions 11, webtim, WelcomeToTheClub, Sam, theTransient, Pierre Kircher, chris gileta, John Villwock, Sean Connelly, Willian Hasse
|
146 |
|
|
|
147 |
Thank you to all my generous patrons and donaters!
|
148 |
|
149 |
<!-- footer end -->
|