CISCai commited on
Commit
1f9fb61
1 Parent(s): a1697ba

Requantized everything with new pre-tokenizer

Browse files
README.md CHANGED
@@ -21,6 +21,8 @@ This repo contains State Of The Art quantized GGUF format model files for [Goril
21
 
22
  Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of training data from [gorilla_openfunctions_v1_train.json](https://github.com/ShishirPatil/gorilla/raw/main/openfunctions/openfunctions-v1/gorilla_openfunctions_v1_train.json).
23
 
 
 
24
  <!-- description end -->
25
 
26
 
@@ -53,6 +55,7 @@ They are also compatible with many third party UIs and libraries provided they a
53
  The new methods available are:
54
 
55
  * GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
 
56
  * GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
57
  * GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
58
  * GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
@@ -62,6 +65,7 @@ The new methods available are:
62
  * GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
63
  * GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
64
  * GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
 
65
 
66
  Refer to the Provided Files table below to see what files use which methods, and how.
67
  </details>
@@ -72,7 +76,7 @@ Refer to the Provided Files table below to see what files use which methods, and
72
 
73
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
74
  | ---- | ---- | ---- | ---- | ---- | ----- |
75
- | [gorilla-openfunctions-v2.IQ1_S.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ1_S.gguf) | IQ1_S | 1 | 1.5 GB| 3.5 GB | smallest, significant quality loss - not recommended |
76
  | [gorilla-openfunctions-v2.IQ2_XXS.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_XXS.gguf) | IQ2_XXS | 2 | 1.8 GB| 3.8 GB | very small, high quality loss |
77
  | [gorilla-openfunctions-v2.IQ2_XS.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_XS.gguf) | IQ2_XS | 2 | 1.9 GB| 3.9 GB | very small, high quality loss |
78
  | [gorilla-openfunctions-v2.IQ2_S.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_S.gguf) | IQ2_S | 2 | 2.1 GB| 4.1 GB | small, substantial quality loss |
 
21
 
22
  Quantization was done with an importance matrix that was trained for ~1M tokens (256 batches of 4096 tokens) of training data from [gorilla_openfunctions_v1_train.json](https://github.com/ShishirPatil/gorilla/raw/main/openfunctions/openfunctions-v1/gorilla_openfunctions_v1_train.json).
23
 
24
+ Everything has been reconverted and quantized with a new importance matrix using llama.cpp from April 29th 2024 onwards, as of commit [f4ab2a4](https://github.com/ggerganov/llama.cpp/commit/f4ab2a41476600a98067a9474ea8f9e6db41bcfa) to ensure correct pre-tokenization. The new GGUFs will work with older llama.cpp, but this may not generate correct prompt tokens, please use a recent build to ensure the best possible results!
25
+
26
  <!-- description end -->
27
 
28
 
 
55
  The new methods available are:
56
 
57
  * GGML_TYPE_IQ1_S - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.56 bits per weight (bpw)
58
+ * GGML_TYPE_IQ1_M - 1-bit quantization in super-blocks with an importance matrix applied, effectively using 1.75 bpw
59
  * GGML_TYPE_IQ2_XXS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.06 bpw
60
  * GGML_TYPE_IQ2_XS - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.31 bpw
61
  * GGML_TYPE_IQ2_S - 2-bit quantization in super-blocks with an importance matrix applied, effectively using 2.5 bpw
 
65
  * GGML_TYPE_IQ3_S - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.44 bpw
66
  * GGML_TYPE_IQ3_M - 3-bit quantization in super-blocks with an importance matrix applied, effectively using 3.66 bpw
67
  * GGML_TYPE_IQ4_XS - 4-bit quantization in super-blocks with an importance matrix applied, effectively using 4.25 bpw
68
+ * GGML_TYPE_IQ4_NL - 4-bit non-linearly mapped quantization with an importance matrix applied, effectively using 4.5 bpw
69
 
70
  Refer to the Provided Files table below to see what files use which methods, and how.
71
  </details>
 
76
 
77
  | Name | Quant method | Bits | Size | Max RAM required | Use case |
78
  | ---- | ---- | ---- | ---- | ---- | ----- |
79
+ | [gorilla-openfunctions-v2.IQ1_S.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ1_S.gguf) | IQ1_S | 1 | 1.5 GB| 3.5 GB | smallest, significant quality loss - **TBD**: Waiting for [this issue](https://github.com/ggerganov/llama.cpp/issues/5996) to be resolved |
80
  | [gorilla-openfunctions-v2.IQ2_XXS.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_XXS.gguf) | IQ2_XXS | 2 | 1.8 GB| 3.8 GB | very small, high quality loss |
81
  | [gorilla-openfunctions-v2.IQ2_XS.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_XS.gguf) | IQ2_XS | 2 | 1.9 GB| 3.9 GB | very small, high quality loss |
82
  | [gorilla-openfunctions-v2.IQ2_S.gguf](https://huggingface.co/CISCai/gorilla-openfunctions-v2-SOTA-GGUF/blob/main/gorilla-openfunctions-v2.IQ2_S.gguf) | IQ2_S | 2 | 2.1 GB| 4.1 GB | small, substantial quality loss |
gorilla-openfunctions-v2.IQ1_S.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:fdf682b7e4b13f67566d2a5ce0eb84b23ddacfe77833d71914c9ee1067bdca59
3
- size 1733444192
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:04f9eac743044b86d3f0e9327a485ef64c1da4343af80e557091d21d0bfc4e9b
3
+ size 1733032512
gorilla-openfunctions-v2.IQ2_M.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8b1e50e1c006c31336256eb1d9076a8b3b7de0558732014b0c69d04abcf7e880
3
- size 2543575648
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d526a9d06e46af7760b019009534224a5e2f8f64f09643ede2f7b76dbff3027c
3
+ size 2543163968
gorilla-openfunctions-v2.IQ2_S.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:15d462b085b9e07c792d261ee4112ca4aa6c9f42051bc47b16cb864d3983c6ba
3
- size 2389533280
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d24e3014a0b073f0adc56ca204d479fbab01b7be7de82221d06df2743348171c
3
+ size 2389121600
gorilla-openfunctions-v2.IQ2_XS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:947e483db824498f5cf43dd5752c184606cb97f93a0ecbce0d0095a9472f2fe2
3
- size 2211299936
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24624bb136c7282ce428fb407a171b3423874123a051885b54a8925cafab77b0
3
+ size 2210888256
gorilla-openfunctions-v2.IQ2_XXS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:8d2484d0ad985417a9d7073bc30f63c9dc5f216a45f568b9892c24b94f6cadd0
3
- size 2041528928
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:686a6b060a7c599a324d1e5a61f000cd8378356c53dafb44d35e31ca707a683b
3
+ size 2041117248
gorilla-openfunctions-v2.IQ3_M.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f1028e3efaeec822182d964f30052d74dd31bd8707dbc4659fe78200afb2c4f5
3
- size 3290088032
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2430e43883556236ce782932e2320b498aee5acea72d05d1cced33fb735f289a
3
+ size 3289676352
gorilla-openfunctions-v2.IQ3_S.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3c2dd4d95355e9bc1754a588cbf59b6530733f9b42c2e7a8c669d245018cdb55
3
- size 3138429536
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4e8b11107d2664d99762fdbf3ea58ffab76adae195188b302b2d144cf0d0948f
3
+ size 3138017856
gorilla-openfunctions-v2.IQ3_XS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f03929c9df18ce0df3026395e67efb1811cf3397f3debd207b6c9d32ad988e93
3
- size 2994020960
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:609e67dbfa1fad1618b0bb4651ee68226650d1fa318ac792bfa844ff21a7215e
3
+ size 2993609280
gorilla-openfunctions-v2.IQ3_XXS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3b344450a000a4dfdd2b699acf442aa143cc1db86af7d7789d3742969ebd64ae
3
- size 2758812256
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:51a39504fce3f252bf618f031e5d2b21f52819e7f8c795cc2bc609a027bcae44
3
+ size 2758400576
gorilla-openfunctions-v2.IQ4_XS.gguf CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:00b3ba4aed1296bb8c10d33f5f10b42938921c1c7a2cf710fc8da60ac3cc0858
3
- size 3797639776
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dde89e1c3bb7ff9d8b0cde4b1e0b511451f9b133e2f5beb823ae6a7d0882c6f7
3
+ size 3797228096
gorilla-openfunctions-v2.imatrix.dat CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e748316022d3d3e9cd3fe51e9eda67c1c078a985aa55d3d127976e9685c86569
3
- size 4277004
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f04c001c103518222eee8800c6d045715a1ffbf36df0fe7c78c10b84a7e812a1
3
+ size 4277047