Seems to be having issues with latest TheBloke runpod configuration
Hello,
If I go and launch TheBloke runpod API template and load up this model, I get malformatted results:
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.
USER: What time is it
ASSISTANT:�‑metÃmente SamGTzeti���� Dum Nueistoletr�Ð�Ã�Ã�ÃÄSpanÃÂÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃÃ
Other models like TheBloke Hermes 13B work fine, and this same Vicuna 7B model works fine on my own machine.
Is there some sort of compatibility issue? How can I fix this?
Thank you!
Same problem to me
Edit:
i already fix it¡
you can find the solution here: https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g/discussions/4
only should download this file: https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g/blob/main/vicuna-7B-1.1-GPTQ-4bit-128g.no-act-order.pt
and replace latest.act-order.safetensors from this file int the model folder
already working on i5-6500 16gb ram ddr4 and nvidia 1060 6gb more o less 2 tokens/s, dont know if this can be more effecient on my actual computer but is faster enough to test te model and see what can do.
however, I have been completely unable to run the 13B. I always run out of memory for CUDA.