Quantization

#5
by Sharpy66 - opened

I tried to quantize this model to 4 bits, but I consistently run out of vram when quantizing layer 27/32. If somebody else could quantize the model, or suggest a way for me to do it myself, that would be great.

I tried running gptq-for-llama on my own computer... out of memory on layer 27/32,
tried running autogptq... Finished, but doesn't save file due to error with cpu offloading on autogptq

Tried running gptq-for-llama on a Google colab.... not enough ram.
Tried running autogptq... seems to succeed but refuses to save the output file...

Sign up or log in to comment