IQ4_NL Quantized version of QwentileLambda2.5-32B-Instruct. A very good merge for multiple Qwen2.5 and QwQ fine-tunes.
I noticed that the IQ4_NL variant was missing in mradermacher's repo. So I'm filling the blank. It tends to behave better than Q4_K_S and Q4_K_M at slightly lower VRAM consumption.
For cards with 24GB of VRAM
- IQ4_NL
It's of an ideal size to be run with 24GB VRAM at 16K to 20K context length.
Settings
Instruction Template: ChatML. You can also use CoT with ChatML-Thinker, but you need to prefill the thinking tag in that case.
Note: If your backend has a setting for it, disable the BoS token. It's set to disabled at the GGUF level, but no all backends recognize the flag.
- Downloads last month
- 116
Hardware compatibility
Log In
to view the estimation
4-bit
Model tree for SerialKicked/QwentileLambda2.5-32B-Instruct-GGUF-IQ4_NL
Base model
Qwen/Qwen2.5-32B
Finetuned
Qwen/QwQ-32B
Finetuned
ArliAI/QwQ-32B-ArliAI-RpR-v3
Finetuned
maldv/QwentileLambda2.5-32B-Instruct