bartowski
/

Mistral-Nemo-Instruct-2407-GGUF

Text Generation

GGUF

Inference Endpoints

imatrix

conversational

Model card Files Files and versions Community

bartowski commited on Aug 28

Commit

b6a436c

•

1 Parent(s): ad70676

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +17 -16

README.md CHANGED Viewed

@@ -1,23 +1,11 @@
 ---
-base_model: mistralai/Mistral-Nemo-Instruct-2407
-language:
-- en
-- fr
-- de
-- es
-- it
-- pt
-- ru
-- zh
-- ja
-license: apache-2.0
-pipeline_tag: text-generation
 quantized_by: bartowski
 ---
 ## Llamacpp imatrix Quantizations of Mistral-Nemo-Instruct-2407
-Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3436">b3436</a> for quantization.
 Original model: https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
@@ -28,7 +16,7 @@ Run them in [LM Studio](https://lmstudio.ai/)
 ## Prompt format
 ```
-<s>[INST] {prompt}[/INST] </s>
 ```
 ## Download a file (not the whole branch) from below:
@@ -36,6 +24,7 @@ Run them in [LM Studio](https://lmstudio.ai/)
 | Filename | Quant type | File Size | Split | Description |
 | -------- | ---------- | --------- | ----- | ----------- |
 | [Mistral-Nemo-Instruct-2407-f32.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-f32.gguf) | f32 | 49.00GB | false | Full F32 weights. |
 | [Mistral-Nemo-Instruct-2407-Q8_0.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q8_0.gguf) | Q8_0 | 13.02GB | false | Extremely high quality, generally unneeded but max available quant. |
 | [Mistral-Nemo-Instruct-2407-Q6_K_L.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q6_K_L.gguf) | Q6_K_L | 10.38GB | false | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |
 | [Mistral-Nemo-Instruct-2407-Q6_K.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q6_K.gguf) | Q6_K | 10.06GB | false | Very high quality, near perfect, *recommended*. |
@@ -46,6 +35,10 @@ Run them in [LM Studio](https://lmstudio.ai/)
 | [Mistral-Nemo-Instruct-2407-Q4_K_M.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q4_K_M.gguf) | Q4_K_M | 7.48GB | false | Good quality, default size for must use cases, *recommended*. |
 | [Mistral-Nemo-Instruct-2407-Q3_K_XL.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q3_K_XL.gguf) | Q3_K_XL | 7.15GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
 | [Mistral-Nemo-Instruct-2407-Q4_K_S.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q4_K_S.gguf) | Q4_K_S | 7.12GB | false | Slightly lower quality with more space savings, *recommended*. |
 | [Mistral-Nemo-Instruct-2407-IQ4_XS.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-IQ4_XS.gguf) | IQ4_XS | 6.74GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
 | [Mistral-Nemo-Instruct-2407-Q3_K_L.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q3_K_L.gguf) | Q3_K_L | 6.56GB | false | Lower quality but usable, good for low RAM availability. |
 | [Mistral-Nemo-Instruct-2407-Q3_K_M.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q3_K_M.gguf) | Q3_K_M | 6.08GB | false | Low quality. |
@@ -56,6 +49,14 @@ Run them in [LM Studio](https://lmstudio.ai/)
 | [Mistral-Nemo-Instruct-2407-Q2_K.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q2_K.gguf) | Q2_K | 4.79GB | false | Very low quality but surprisingly usable. |
 | [Mistral-Nemo-Instruct-2407-IQ2_M.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-IQ2_M.gguf) | IQ2_M | 4.44GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
 ## Credits
 Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset
@@ -79,7 +80,7 @@ huggingface-cli download bartowski/Mistral-Nemo-Instruct-2407-GGUF --include "Mi
 If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
 ```
-huggingface-cli download bartowski/Mistral-Nemo-Instruct-2407-GGUF --include "Mistral-Nemo-Instruct-2407-Q8_0.gguf/*" --local-dir Mistral-Nemo-Instruct-2407-Q8_0
 ```
 You can either specify a new local-dir (Mistral-Nemo-Instruct-2407-Q8_0) or download them all in place (./)

 ---
 quantized_by: bartowski
+pipeline_tag: text-generation
 ---
 ## Llamacpp imatrix Quantizations of Mistral-Nemo-Instruct-2407
+Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3634">b3634</a> for quantization.
 Original model: https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
 ## Prompt format
 ```
+<s>[INST]{prompt}[/INST]
 ```
 ## Download a file (not the whole branch) from below:
 | Filename | Quant type | File Size | Split | Description |
 | -------- | ---------- | --------- | ----- | ----------- |
 | [Mistral-Nemo-Instruct-2407-f32.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-f32.gguf) | f32 | 49.00GB | false | Full F32 weights. |
+| [Mistral-Nemo-Instruct-2407-f16.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-f16.gguf) | f16 | 24.50GB | false | Full F16 weights. |
 | [Mistral-Nemo-Instruct-2407-Q8_0.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q8_0.gguf) | Q8_0 | 13.02GB | false | Extremely high quality, generally unneeded but max available quant. |
 | [Mistral-Nemo-Instruct-2407-Q6_K_L.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q6_K_L.gguf) | Q6_K_L | 10.38GB | false | Uses Q8_0 for embed and output weights. Very high quality, near perfect, *recommended*. |
 | [Mistral-Nemo-Instruct-2407-Q6_K.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q6_K.gguf) | Q6_K | 10.06GB | false | Very high quality, near perfect, *recommended*. |
 | [Mistral-Nemo-Instruct-2407-Q4_K_M.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q4_K_M.gguf) | Q4_K_M | 7.48GB | false | Good quality, default size for must use cases, *recommended*. |
 | [Mistral-Nemo-Instruct-2407-Q3_K_XL.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q3_K_XL.gguf) | Q3_K_XL | 7.15GB | false | Uses Q8_0 for embed and output weights. Lower quality but usable, good for low RAM availability. |
 | [Mistral-Nemo-Instruct-2407-Q4_K_S.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q4_K_S.gguf) | Q4_K_S | 7.12GB | false | Slightly lower quality with more space savings, *recommended*. |
+| [Mistral-Nemo-Instruct-2407-Q4_0.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q4_0.gguf) | Q4_0 | 7.09GB | false | Legacy format, generally not worth using over similarly sized formats |
+| [Mistral-Nemo-Instruct-2407-Q4_0_8_8.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q4_0_8_8.gguf) | Q4_0_8_8 | 7.07GB | false | Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality. |
+| [Mistral-Nemo-Instruct-2407-Q4_0_4_8.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q4_0_4_8.gguf) | Q4_0_4_8 | 7.07GB | false | Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality. |
+| [Mistral-Nemo-Instruct-2407-Q4_0_4_4.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q4_0_4_4.gguf) | Q4_0_4_4 | 7.07GB | false | Optimized for ARM and CPU inference, much faster than Q4_0 at similar quality. |
 | [Mistral-Nemo-Instruct-2407-IQ4_XS.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-IQ4_XS.gguf) | IQ4_XS | 6.74GB | false | Decent quality, smaller than Q4_K_S with similar performance, *recommended*. |
 | [Mistral-Nemo-Instruct-2407-Q3_K_L.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q3_K_L.gguf) | Q3_K_L | 6.56GB | false | Lower quality but usable, good for low RAM availability. |
 | [Mistral-Nemo-Instruct-2407-Q3_K_M.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q3_K_M.gguf) | Q3_K_M | 6.08GB | false | Low quality. |
 | [Mistral-Nemo-Instruct-2407-Q2_K.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-Q2_K.gguf) | Q2_K | 4.79GB | false | Very low quality but surprisingly usable. |
 | [Mistral-Nemo-Instruct-2407-IQ2_M.gguf](https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/Mistral-Nemo-Instruct-2407-IQ2_M.gguf) | IQ2_M | 4.44GB | false | Relatively low quality, uses SOTA techniques to be surprisingly usable. |
+## Embed/output weights
+Some of these quants (Q3_K_XL, Q4_K_L etc) are the standard quantization method with the embeddings and output weights quantized to Q8_0 instead of what they would normally default to.
+Some say that this improves the quality, others don't notice any difference. If you use these models PLEASE COMMENT with your findings. I would like feedback that these are actually used and useful so I don't keep uploading quants no one is using.
+Thanks!
 ## Credits
 Thank you kalomaze and Dampf for assistance in creating the imatrix calibration dataset
 If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
 ```
+huggingface-cli download bartowski/Mistral-Nemo-Instruct-2407-GGUF --include "Mistral-Nemo-Instruct-2407-Q8_0/*" --local-dir ./
 ```
 You can either specify a new local-dir (Mistral-Nemo-Instruct-2407-Q8_0) or download them all in place (./)