Load into 2 GPUs

#28
by sauravm8 - opened

Have 2 A10 GPUs. The memory is not enough to load the model on 1 using cuda:0, is there a way both GPUs can be used? On not specifying device, the inference is not working

maybe you could try the gpu-split setting in the model config page, my 2 x 22g 2080ti run smoothly with this setting

@chraac how to do this programatically?

@chraac how to do this programatically?

from the README of the text-generation-webui, when using exllama loader, there's a parameter called --gpu-split can specify the ram usage of each GPU

Sign up or log in to comment