IQ4_NL Quantized version of QwentileLambda2.5-32B-Instruct. A very good merge for multiple Qwen2.5 and QwQ fine-tunes.

I noticed that the IQ4_NL variant was missing in mradermacher's repo. So I'm filling the blank. It tends to behave better than Q4_K_S and Q4_K_M at slightly lower VRAM consumption.

For cards with 24GB of VRAM

  • IQ4_NL

It's of an ideal size to be run with 24GB VRAM at 16K to 20K context length.

Settings

Instruction Template: ChatML. You can also use CoT with ChatML-Thinker, but you need to prefill the thinking tag in that case.

Note: If your backend has a setting for it, disable the BoS token. It's set to disabled at the GGUF level, but no all backends recognize the flag.

Downloads last month
116
GGUF
Model size
33B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SerialKicked/QwentileLambda2.5-32B-Instruct-GGUF-IQ4_NL

Base model

Qwen/Qwen2.5-32B
Finetuned
Qwen/QwQ-32B
Quantized
(3)
this model