Champs
#1
by
groxaxo
- opened
Hi Champs, thanks a lot for your work!
Is there any chance on running this in multiple gpus ? thank you !
I wouldn't really recommend using bnb 4bit models for inference. it will most likely work yes, but youre better off using an int4 quant
I wouldn't really recommend using bnb 4bit models for inference. it will most likely work yes, but youre better off using an int4 quant
can you provide instructions on how to run int4 quant on a multiple GPU setup? I have 5x5090 on a Intel QYFS (56 cores/112 threads) with 512GB of DDR5 RAM 4800MHz. System is Ubuntu 24.04 and I run models usually on llama.cpp and ollama. I also use ik_llama and ktransformers. Is there an easy to follow guide for this?