Requantized everything with new pre-tokenizer
Browse files- README.md +5 -1
- gorilla-openfunctions-v2.IQ1_S.gguf +2 -2
- gorilla-openfunctions-v2.IQ2_M.gguf +2 -2
- gorilla-openfunctions-v2.IQ2_S.gguf +2 -2
- gorilla-openfunctions-v2.IQ2_XS.gguf +2 -2
- gorilla-openfunctions-v2.IQ2_XXS.gguf +2 -2
- gorilla-openfunctions-v2.IQ3_M.gguf +2 -2
- gorilla-openfunctions-v2.IQ3_S.gguf +2 -2
- gorilla-openfunctions-v2.IQ3_XS.gguf +2 -2
- gorilla-openfunctions-v2.IQ3_XXS.gguf +2 -2
- gorilla-openfunctions-v2.IQ4_XS.gguf +2 -2
- gorilla-openfunctions-v2.imatrix.dat +2 -2
README.md
CHANGED
@@ -21,6 +21,8 @@ This repo contains State Of The Art quantized GGUF format model files for [Goril
|
|
21 |
|
22 |
Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of training data from [gorilla_openfunctions_v1_train.json](https://github.com/ShishirPatil/gorilla/raw/main/openfunctions/openfunctions-v1/gorilla_openfunctions_v1_train.json).
|
23 |
|
|
|
|
|
24 |
<!-- description end -->
|
25 |
|
26 |
|
@@ -53,6 +55,7 @@ They are also compatible with many third party UIs and libraries provided they a
|
|
53 |
The new methods available are:
|
54 |
|
55 |
* GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
|
|
|
56 |
* GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
|
57 |
* GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
|
58 |
* GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
|
@@ -62,6 +65,7 @@ The new methods available are:
|
|
62 |
* GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
|
63 |
* GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
|
64 |
* GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
|
|
|
65 |
|
66 |
Refer to the Provided Files table below to see what files use which methods, and how.
|
67 |
</details>
|
@@ -72,7 +76,7 @@ Refer to the Provided Files table below to see what files use which methods, and
|
|
72 |
|
73 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
74 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
75 |
-
| [gorilla-openfunctions-v2.IQ1_S.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ1_S.gguf) | IQ1_S | 1 | 1.5 GB| 3.5 GB | smallest, significant quality loss -
|
76 |
| [gorilla-openfunctions-v2.IQ2_XXS.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_XXS.gguf) | IQ2_XXS | 2 | 1.8 GB| 3.8 GB | very small, high quality loss |
|
77 |
| [gorilla-openfunctions-v2.IQ2_XS.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_XS.gguf) | IQ2_XS | 2 | 1.9 GB| 3.9 GB | very small, high quality loss |
|
78 |
| [gorilla-openfunctions-v2.IQ2_S.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_S.gguf) | IQ2_S | 2 | 2.1 GB| 4.1 GB | small, substantial quality loss |
|
|
|
21 |
|
22 |
Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of training data from [gorilla_openfunctions_v1_train.json](https://github.com/ShishirPatil/gorilla/raw/main/openfunctions/openfunctions-v1/gorilla_openfunctions_v1_train.json).
|
23 |
|
24 |
+
Everything has been reconverted and quantized with a new importance matrix using llama.cpp from April 29th 2024 onwards, as of commit [f4ab2a4](https://github.com/ggerganov/llama.cpp/commit/f4ab2a41476600a98067a9474ea8f9e6db41bcfa) to ensure correct pre-tokenization. The new GGUFs will work with older llama.cpp, but this may not generate correct prompt tokens, please use a recent build to ensure the best possible results!
|
25 |
+
|
26 |
<!-- description end -->
|
27 |
|
28 |
|
|
|
55 |
The new methods available are:
|
56 |
|
57 |
* GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
|
58 |
+
* GGML_TYPE_IQ1_M - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.75 bpw
|
59 |
* GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
|
60 |
* GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
|
61 |
* GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
|
|
|
65 |
* GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
|
66 |
* GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
|
67 |
* GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
|
68 |
+
* GGML_TYPE_IQ4_NL - 4-bit non-linearly mapped quantization with an importance matrix applied, effectively using 4.5 bpw
|
69 |
|
70 |
Refer to the Provided Files table below to see what files use which methods, and how.
|
71 |
</details>
|
|
|
76 |
|
77 |
| Name | Quant method | Bits | Size | Max RAM required | Use case |
|
78 |
| ---- | ---- | ---- | ---- | ---- | ----- |
|
79 |
+
| [gorilla-openfunctions-v2.IQ1_S.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ1_S.gguf) | IQ1_S | 1 | 1.5 GB| 3.5 GB | smallest, significant quality loss - **TBD**: Waiting for [this issue](https://github.com/ggerganov/llama.cpp/issues/5996) to be resolved |
|
80 |
| [gorilla-openfunctions-v2.IQ2_XXS.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_XXS.gguf) | IQ2_XXS | 2 | 1.8 GB| 3.8 GB | very small, high quality loss |
|
81 |
| [gorilla-openfunctions-v2.IQ2_XS.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_XS.gguf) | IQ2_XS | 2 | 1.9 GB| 3.9 GB | very small, high quality loss |
|
82 |
| [gorilla-openfunctions-v2.IQ2_S.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_S.gguf) | IQ2_S | 2 | 2.1 GB| 4.1 GB | small, substantial quality loss |
|
gorilla-openfunctions-v2.IQ1_S.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:04f9eac743044b86d3f0e9327a485ef64c1da4343af80e557091d21d0bfc4e9b
|
3 |
+
size 1733032512
|
gorilla-openfunctions-v2.IQ2_M.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d526a9d06e46af7760b019009534224a5e2f8f64f09643ede2f7b76dbff3027c
|
3 |
+
size 2543163968
|
gorilla-openfunctions-v2.IQ2_S.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d24e3014a0b073f0adc56ca204d479fbab01b7be7de82221d06df2743348171c
|
3 |
+
size 2389121600
|
gorilla-openfunctions-v2.IQ2_XS.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:24624bb136c7282ce428fb407a171b3423874123a051885b54a8925cafab77b0
|
3 |
+
size 2210888256
|
gorilla-openfunctions-v2.IQ2_XXS.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:686a6b060a7c599a324d1e5a61f000cd8378356c53dafb44d35e31ca707a683b
|
3 |
+
size 2041117248
|
gorilla-openfunctions-v2.IQ3_M.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:2430e43883556236ce782932e2320b498aee5acea72d05d1cced33fb735f289a
|
3 |
+
size 3289676352
|
gorilla-openfunctions-v2.IQ3_S.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4e8b11107d2664d99762fdbf3ea58ffab76adae195188b302b2d144cf0d0948f
|
3 |
+
size 3138017856
|
gorilla-openfunctions-v2.IQ3_XS.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:609e67dbfa1fad1618b0bb4651ee68226650d1fa318ac792bfa844ff21a7215e
|
3 |
+
size 2993609280
|
gorilla-openfunctions-v2.IQ3_XXS.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:51a39504fce3f252bf618f031e5d2b21f52819e7f8c795cc2bc609a027bcae44
|
3 |
+
size 2758400576
|
gorilla-openfunctions-v2.IQ4_XS.gguf
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:dde89e1c3bb7ff9d8b0cde4b1e0b511451f9b133e2f5beb823ae6a7d0882c6f7
|
3 |
+
size 3797228096
|
gorilla-openfunctions-v2.imatrix.dat
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f04c001c103518222eee8800c6d045715a1ffbf36df0fe7c78c10b84a7e812a1
|
3 |
+
size 4277047
|