Llamacpp Quantizations of Ina-v11.1
Ina interprets persona definitions as executable instructions.
The model follows <<CHARACTER_DESCRIPTION>> blocks with extremely high fidelity even during 10kโ15k token erotic or dark-fiction role-play sessions.
Fine-tuned by BaiAI and Eric Hartford (QuixiAI) using QLoRA + DPO on large volumes of RP logs, creator-voice datasets, and persona modules.
Contributor Credits:
- "Cheshire Cat"
- FitQueen666
- Jaroslavs Samcuks
- Eric Hartford
Using llama.cpp
Original model: https://huggingface.co/QuixiAI/Ina-v11.1
Run them in LM Studio
Run them directly with llama.cpp, or any other llama.cpp based project
Download a file (not the whole branch) from below:
| Filename | Quant type | File Size | Split | Description |
|---|---|---|---|---|
| Ina-v11.1-Q8_0.gguf | Q8_0 | 70GB | true | Extremely high quality, generally unneeded but max available quant. |
| Ina-v11.1-Q6_K.gguf | Q6_K | 54GB | true | Very high quality, near perfect, recommended. |
| Ina-v11.1-Q5_K_M.gguf | Q5_K_M | 47GB | true | High quality, recommended. |
| Ina-v11.1-Q5_K_S.gguf | Q5_K_S | 38GB | true | High quality, recommended. |
| Ina-v11.1-Q4_K_M.gguf | Q4_K_M | 40GB | true | Good quality, default size for most use cases, recommended. |
| Ina-v11.1-Q4_1.gguf | Q4_1 | 41GB | true | Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon. |
| Ina-v11.1-Q4_K_S.gguf | Q4_K_S | 38GB | true | Slightly lower quality with more space savings, recommended. |
| Ina-v11.1-Q4_0.gguf | Q4_0 | 37GB | true | Legacy format, offers online repacking for ARM and AVX CPU inference. |
| Ina-v11.1-IQ4_NL.gguf | IQ4_NL | 38GB | true | Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference. |
| Ina-v11.1-IQ4_XS.gguf | IQ4_XS | 36GB | true | Decent quality, smaller than Q4_K_S with similar performance, recommended. |
| Ina-v11.1-Q3_K_L.gguf | Q3_K_L | 35GB | true | Lower quality but usable, good for low RAM availability. |
| Ina-v11.1-Q3_K_M.gguf | Q3_K_M | 32GB | true | Low quality. |
| Ina-v11.1-IQ3_M.gguf | IQ3_M | 30GB | true | Medium-low quality, new method with decent performance comparable to Q3_K_M. |
| Ina-v11.1-Q3_K_S.gguf | Q3_K_S | 29GB | true | Low quality, not recommended. |
| Ina-v11.1-IQ3_XS.gguf | IQ3_XS | 27GB | false | Lower quality, new method with decent performance, slightly better than Q3_K_S. |
Downloading using huggingface-cli
Click to view download instructions
First, make sure you have hugginface-cli installed:
pip install -U "huggingface_hub[cli]"
Then, you can target the specific file you want:
huggingface-cli download QuixiAI/Ina-v11.1-gguf --include "Ina-v11.1-Q4_K_M.gguf" --local-dir ./
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
huggingface-cli download QuixiAI/Ina-v11.1-gguf --include "Ina-v11.1-Q8_0/*" --local-dir ./
You can either specify a new local-dir (Ina-v11.1-Q8_0) or download them all in place (./)
Which file should I choose?
Click here for details
A great write up with charts showing various performances is provided by Artefact2 here
The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.
If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.
If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.
Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.
If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.
If you want to get more into the weeds, you can check out this extremely useful feature chart:
But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.
These I-quants can also be used on CPU, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.
Credits
I copied Bartowski's model card and made it my own, cheers!
- Downloads last month
- 2,195
3-bit
4-bit
5-bit
6-bit
8-bit