Requantized everything with new pre-tokenizer

Browse files

Files changed (12) hide show

README.md +5 -1
gorilla-openfunctions-v2.IQ1_S.gguf +2 -2
gorilla-openfunctions-v2.IQ2_M.gguf +2 -2
gorilla-openfunctions-v2.IQ2_S.gguf +2 -2
gorilla-openfunctions-v2.IQ2_XS.gguf +2 -2
gorilla-openfunctions-v2.IQ2_XXS.gguf +2 -2
gorilla-openfunctions-v2.IQ3_M.gguf +2 -2
gorilla-openfunctions-v2.IQ3_S.gguf +2 -2
gorilla-openfunctions-v2.IQ3_XS.gguf +2 -2
gorilla-openfunctions-v2.IQ3_XXS.gguf +2 -2
gorilla-openfunctions-v2.IQ4_XS.gguf +2 -2
gorilla-openfunctions-v2.imatrix.dat +2 -2

README.md CHANGED Viewed

@@ -21,6 +21,8 @@ This repo contains State Of The Art quantized GGUF format model files for [Goril
 Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of training data from [gorilla_openfunctions_v1_train.json](https://github.com/ShishirPatil/gorilla/raw/main/openfunctions/openfunctions-v1/gorilla_openfunctions_v1_train.json).
 <!-- description end -->
@@ -53,6 +55,7 @@ They are also compatible with many third party UIs and libraries provided they a
 The new methods available are:
 * GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
 * GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
 * GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
 * GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
@@ -62,6 +65,7 @@ The new methods available are:
 * GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
 * GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
 * GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
 Refer to the Provided Files table below to see what files use which methods, and how.
 </details>
@@ -72,7 +76,7 @@ Refer to the Provided Files table below to see what files use which methods, and
 | Name | Quant method | Bits | Size | Max RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
-| [gorilla-openfunctions-v2.IQ1_S.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ1_S.gguf) | IQ1_S | 1 | 1.5 GB| 3.5 GB | smallest, significant quality loss - not recommended |
 | [gorilla-openfunctions-v2.IQ2_XXS.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_XXS.gguf) | IQ2_XXS | 2 | 1.8 GB| 3.8 GB | very small, high quality loss |
 | [gorilla-openfunctions-v2.IQ2_XS.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_XS.gguf) | IQ2_XS | 2 | 1.9 GB| 3.9 GB | very small, high quality loss |
 | [gorilla-openfunctions-v2.IQ2_S.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_S.gguf) | IQ2_S | 2 | 2.1 GB| 4.1 GB | small, substantial quality loss |

 Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of training data from [gorilla_openfunctions_v1_train.json](https://github.com/ShishirPatil/gorilla/raw/main/openfunctions/openfunctions-v1/gorilla_openfunctions_v1_train.json).
+Everything has been reconverted and quantized with a new importance matrix using llama.cpp from April 29th 2024 onwards, as of commit [f4ab2a4](https://github.com/ggerganov/llama.cpp/commit/f4ab2a41476600a98067a9474ea8f9e6db41bcfa) to ensure correct pre-tokenization. The new GGUFs will work with older llama.cpp, but this may not generate correct prompt tokens, please use a recent build to ensure the best possible results!
 <!-- description end -->
 The new methods available are:
 * GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
+* GGML_TYPE_IQ1_M - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.75 bpw
 * GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
 * GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
 * GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
 * GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
 * GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
 * GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
+* GGML_TYPE_IQ4_NL - 4-bit non-linearly mapped quantization with an importance matrix applied, effectively using 4.5 bpw
 Refer to the Provided Files table below to see what files use which methods, and how.
 </details>
 | Name | Quant method | Bits | Size | Max RAM required | Use case |
 | ---- | ---- | ---- | ---- | ---- | ----- |
+| [gorilla-openfunctions-v2.IQ1_S.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ1_S.gguf) | IQ1_S | 1 | 1.5 GB| 3.5 GB | smallest, significant quality loss - **TBD**: Waiting for [this issue](https://github.com/ggerganov/llama.cpp/issues/5996) to be resolved |
 | [gorilla-openfunctions-v2.IQ2_XXS.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_XXS.gguf) | IQ2_XXS | 2 | 1.8 GB| 3.8 GB | very small, high quality loss |
 | [gorilla-openfunctions-v2.IQ2_XS.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_XS.gguf) | IQ2_XS | 2 | 1.9 GB| 3.9 GB | very small, high quality loss |
 | [gorilla-openfunctions-v2.IQ2_S.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_S.gguf) | IQ2_S | 2 | 2.1 GB| 4.1 GB | small, substantial quality loss |

gorilla-openfunctions-v2.IQ1_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:fdf682b7e4b13f67566d2a5ce0eb84b23ddacfe77833d71914c9ee1067bdca59
-size 1733444192

 version https://git-lfs.github.com/spec/v1
+oid sha256:04f9eac743044b86d3f0e9327a485ef64c1da4343af80e557091d21d0bfc4e9b
+size 1733032512

gorilla-openfunctions-v2.IQ2_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8b1e50e1c006c31336256eb1d9076a8b3b7de0558732014b0c69d04abcf7e880
-size 2543575648

 version https://git-lfs.github.com/spec/v1
+oid sha256:d526a9d06e46af7760b019009534224a5e2f8f64f09643ede2f7b76dbff3027c
+size 2543163968

gorilla-openfunctions-v2.IQ2_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:15d462b085b9e07c792d261ee4112ca4aa6c9f42051bc47b16cb864d3983c6ba
-size 2389533280

 version https://git-lfs.github.com/spec/v1
+oid sha256:d24e3014a0b073f0adc56ca204d479fbab01b7be7de82221d06df2743348171c
+size 2389121600

gorilla-openfunctions-v2.IQ2_XS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:947e483db824498f5cf43dd5752c184606cb97f93a0ecbce0d0095a9472f2fe2
-size 2211299936

 version https://git-lfs.github.com/spec/v1
+oid sha256:24624bb136c7282ce428fb407a171b3423874123a051885b54a8925cafab77b0
+size 2210888256

gorilla-openfunctions-v2.IQ2_XXS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8d2484d0ad985417a9d7073bc30f63c9dc5f216a45f568b9892c24b94f6cadd0
-size 2041528928

 version https://git-lfs.github.com/spec/v1
+oid sha256:686a6b060a7c599a324d1e5a61f000cd8378356c53dafb44d35e31ca707a683b
+size 2041117248

gorilla-openfunctions-v2.IQ3_M.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f1028e3efaeec822182d964f30052d74dd31bd8707dbc4659fe78200afb2c4f5
-size 3290088032

 version https://git-lfs.github.com/spec/v1
+oid sha256:2430e43883556236ce782932e2320b498aee5acea72d05d1cced33fb735f289a
+size 3289676352

gorilla-openfunctions-v2.IQ3_S.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3c2dd4d95355e9bc1754a588cbf59b6530733f9b42c2e7a8c669d245018cdb55
-size 3138429536

 version https://git-lfs.github.com/spec/v1
+oid sha256:4e8b11107d2664d99762fdbf3ea58ffab76adae195188b302b2d144cf0d0948f
+size 3138017856

gorilla-openfunctions-v2.IQ3_XS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f03929c9df18ce0df3026395e67efb1811cf3397f3debd207b6c9d32ad988e93
-size 2994020960

 version https://git-lfs.github.com/spec/v1
+oid sha256:609e67dbfa1fad1618b0bb4651ee68226650d1fa318ac792bfa844ff21a7215e
+size 2993609280

gorilla-openfunctions-v2.IQ3_XXS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3b344450a000a4dfdd2b699acf442aa143cc1db86af7d7789d3742969ebd64ae
-size 2758812256

 version https://git-lfs.github.com/spec/v1
+oid sha256:51a39504fce3f252bf618f031e5d2b21f52819e7f8c795cc2bc609a027bcae44
+size 2758400576

gorilla-openfunctions-v2.IQ4_XS.gguf CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:00b3ba4aed1296bb8c10d33f5f10b42938921c1c7a2cf710fc8da60ac3cc0858
-size 3797639776

 version https://git-lfs.github.com/spec/v1
+oid sha256:dde89e1c3bb7ff9d8b0cde4b1e0b511451f9b133e2f5beb823ae6a7d0882c6f7
+size 3797228096

gorilla-openfunctions-v2.imatrix.dat CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e748316022d3d3e9cd3fe51e9eda67c1c078a985aa55d3d127976e9685c86569
-size 4277004

 version https://git-lfs.github.com/spec/v1
+oid sha256:f04c001c103518222eee8800c6d045715a1ffbf36df0fe7c78c10b84a7e812a1
+size 4277047