TheBloke/vicuna-7B-1.1-GPTQ · Seems to be having issues with latest TheBloke runpod configuration

Jun 6, 2023

Hello,

If I go and launch TheBloke runpod API template and load up this model, I get malformatted results:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

USER: What time is it
ASSISTANT:�‑metÃmente SamGTzeti�� Dum Nueistoletr�Ð�Ã�Ã�ÃÄSpanÃÂÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ

Other models like TheBloke Hermes 13B work fine, and this same Vicuna 7B model works fine on my own machine.

Is there some sort of compatibility issue? How can I fix this?

Thank you!

robin4286 changed discussion title from Seems to be having issues with latest TheBloke run-of configuration to Seems to be having issues with latest TheBloke runpod configuration Jun 6, 2023

alexdlhh

Jun 9, 2023

•

edited Jun 9, 2023

Same problem to me

Edit:
i already fix it¡
you can find the solution here: https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g/discussions/4

only should download this file: https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g/blob/main/vicuna-7B-1.1-GPTQ-4bit-128g.no-act-order.pt

and replace latest.act-order.safetensors from this file int the model folder

already working on i5-6500 16gb ram ddr4 and nvidia 1060 6gb more o less 2 tokens/s, dont know if this can be more effecient on my actual computer but is faster enough to test te model and see what can do.

however, I have been completely unable to run the 13B. I always run out of memory for CUDA.