Llamacpp Quantizations of Ina-v11.1

Ina interprets persona definitions as executable instructions. The model follows <<CHARACTER_DESCRIPTION>> blocks with extremely high fidelity even during 10k–15k token erotic or dark-fiction role-play sessions.

Fine-tuned by BaiAI and Eric Hartford (QuixiAI) using QLoRA + DPO on large volumes of RP logs, creator-voice datasets, and persona modules.

Contributor Credits:

Using llama.cpp

Original model: https://huggingface.co/QuixiAI/Ina-v11.1

Run them in LM Studio

Run them directly with llama.cpp, or any other llama.cpp based project

Download a file (not the whole branch) from below:

Filename	Quant type	File Size	Split	Description
Ina-v11.1-Q8_0.gguf	Q8_0	70GB	true	Extremely high quality, generally unneeded but max available quant.
Ina-v11.1-Q6_K.gguf	Q6_K	54GB	true	Very high quality, near perfect, recommended.
Ina-v11.1-Q5_K_M.gguf	Q5_K_M	47GB	true	High quality, recommended.
Ina-v11.1-Q5_K_S.gguf	Q5_K_S	38GB	true	High quality, recommended.
Ina-v11.1-Q4_K_M.gguf	Q4_K_M	40GB	true	Good quality, default size for most use cases, recommended.
Ina-v11.1-Q4_1.gguf	Q4_1	41GB	true	Legacy format, similar performance to Q4_K_S but with improved tokens/watt on Apple silicon.
Ina-v11.1-Q4_K_S.gguf	Q4_K_S	38GB	true	Slightly lower quality with more space savings, recommended.
Ina-v11.1-Q4_0.gguf	Q4_0	37GB	true	Legacy format, offers online repacking for ARM and AVX CPU inference.
Ina-v11.1-IQ4_NL.gguf	IQ4_NL	38GB	true	Similar to IQ4_XS, but slightly larger. Offers online repacking for ARM CPU inference.
Ina-v11.1-IQ4_XS.gguf	IQ4_XS	36GB	true	Decent quality, smaller than Q4_K_S with similar performance, recommended.
Ina-v11.1-Q3_K_L.gguf	Q3_K_L	35GB	true	Lower quality but usable, good for low RAM availability.
Ina-v11.1-Q3_K_M.gguf	Q3_K_M	32GB	true	Low quality.
Ina-v11.1-IQ3_M.gguf	IQ3_M	30GB	true	Medium-low quality, new method with decent performance comparable to Q3_K_M.
Ina-v11.1-Q3_K_S.gguf	Q3_K_S	29GB	true	Low quality, not recommended.
Ina-v11.1-IQ3_XS.gguf	IQ3_XS	27GB	false	Lower quality, new method with decent performance, slightly better than Q3_K_S.

Downloading using huggingface-cli

Click to view download instructions

First, make sure you have hugginface-cli installed:

pip install -U "huggingface_hub[cli]"

Then, you can target the specific file you want:

huggingface-cli download QuixiAI/Ina-v11.1-gguf --include "Ina-v11.1-Q4_K_M.gguf" --local-dir ./

If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:

huggingface-cli download QuixiAI/Ina-v11.1-gguf --include "Ina-v11.1-Q8_0/*" --local-dir ./

You can either specify a new local-dir (Ina-v11.1-Q8_0) or download them all in place (./)

Which file should I choose?

Click here for details

A great write up with charts showing various performances is provided by Artefact2 here

The first thing to figure out is how big a model you can run. To do this, you'll need to figure out how much RAM and/or VRAM you have.

If you want your model running as FAST as possible, you'll want to fit the whole thing on your GPU's VRAM. Aim for a quant with a file size 1-2GB smaller than your GPU's total VRAM.

If you want the absolute maximum quality, add both your system RAM and your GPU's VRAM together, then similarly grab a quant with a file size 1-2GB Smaller than that total.

Next, you'll need to decide if you want to use an 'I-quant' or a 'K-quant'.

If you don't want to think too much, grab one of the K-quants. These are in format 'QX_K_X', like Q5_K_M.

If you want to get more into the weeds, you can check out this extremely useful feature chart:

llama.cpp feature matrix

But basically, if you're aiming for below Q4, and you're running cuBLAS (Nvidia) or rocBLAS (AMD), you should look towards the I-quants. These are in format IQX_X, like IQ3_M. These are newer and offer better performance for their size.

These I-quants can also be used on CPU, but will be slower than their K-quant equivalent, so speed vs performance is a tradeoff you'll have to decide.

Credits

I copied Bartowski's model card and made it my own, cheers!

Downloads last month: 2,195

GGUF

Model size

71B params

Architecture

llama

Hardware compatibility

3-bit

4-bit

5-bit

6-bit

8-bit

Model tree for QuixiAI/Ina-v11.1-gguf

Base model

NeverSleep/Lumimaid-v0.2-70B

Finetuned

QuixiAI/Ina-v11.1

Quantized

(8)

this model