Champs

#1
by groxaxo - opened

Hi Champs, thanks a lot for your work!
Is there any chance on running this in multiple gpus ? thank you !

Unsloth AI org

I wouldn't really recommend using bnb 4bit models for inference. it will most likely work yes, but youre better off using an int4 quant

I wouldn't really recommend using bnb 4bit models for inference. it will most likely work yes, but youre better off using an int4 quant

can you provide instructions on how to run int4 quant on a multiple GPU setup? I have 5x5090 on a Intel QYFS (56 cores/112 threads) with 512GB of DDR5 RAM 4800MHz. System is Ubuntu 24.04 and I run models usually on llama.cpp and ollama. I also use ik_llama and ktransformers. Is there an easy to follow guide for this?

Sign up or log in to comment