Using the text-generation-webui api with the model

#6
by Hovav - opened

Hi, i'm trying to use the text-generation-webui api to run the model. The line i'm running: python server.py --api --api-blocking-port 8827 --api-streaming-port 8815 --model TheBloke_guanaco-65B-GPTQ --wbits 4 --chat .
It loads the model correctl, i'm connecting to api but when i'm trying to send prompt it gives the message:
File "/home/users///text-generation-webui/repositories/GPTQ-for-LLaMa/quant.py", line 426, in forward
quant_cuda.vecquant4matmul(x, self.qweight, y, self.scales, self.qzeros, self.groupsize)
TypeError: vecquant4matmul(): incompatible function arguments. The following argument types are supported:
1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: torch.Tensor, arg3: torch.Tensor, arg4: torch.Tensor, arg5: torch.Tensor) -> None

When im running the model using the webui everything works good.

Any advice? Thanks!

Firstly just checking you have 48GB of VRAM available? If not, I wouldn't recommend using this model.

If so, then this error looks to be because of some issues with your GPTQ-for-LLaMa install. How have you installed text-generation-webui and GPTQ-for-LLaMa? Did you recently try upgrading or changing GPTQ-for-LLaMa?

Yes, I have more then 48GB of VRAM. When i'm accessing the textgen webui and loading the model using 4bit everything works correctly, I can send prompts and it generates the text so I don't think its environment problem. I've installed the text gen webui using the one-click installer for linux. For testing the api I'm using the script api-example-chat.py in the text-generation-webui folder. The api working good for other models but not for the guanaco-65B-GPTQ. Maybe its configuration problem?

Sorry, it seems it was environment problem after all. I reinstalled it and now it worked. Thanks for the quick reply!

Great, glad it's working

@Hovav , I am very new to text-gen web UI. Is it possible to load my local models as API keys, like Open Ai keys?
For example, I've been following many tutorials, and most of them use open_ai keys, I instead want to use my local models instead. Is there a way to do this?

If possible, please point me towards articles/blogs/tutorials that do this.
Thanks.

@Sat7166 yes it's possible. text-generation-webui has its own API which you can use. And it has an extension which provides an OpenAI compatible API - ie you can hit text-generation-webui using exactly the same code as you would hit OpenAI. Check the text-generation-webui Github for more details

I can't find any tutorials on it, but there's info in their Github and people discussing it in various places, so you can try Googling for more info

Thanks for your reply @TheBloke , I'll check it out.
Also, I wanted to thank you for your work. I am new to LLMs, but I like them very, very much. I've never really been so hyper-focused on anything before, and I love this feeling of working on new LLM-related projects.
You are a big part in helping me develop this as I have a m1 pro setup, and GGML versions are really my saviour here xD.
Though I'd be very happy if you could point me towards online resources where I could learn more about LLM's, what makes them tick and how to optimize them in the right way. I have of course read through a lot of articles but it gets kinda overwhelming sometimes.
Thanks

Hi Sat some extra info on the API messaging supported can be found here: https://github.com/oobabooga/text-generation-webui/blob/main/api-examples/api-example-chat-stream.py

I am also looking for some more info on it and will post as i come across it!

@ ahtripleblind, thanks :-)

@Hovav i am implementing kind of similar use case. Are you running this on Runpod?

As a side note I ended up using huggingface chatui. The documentation is a lot better defined and clear

Best of luck!!

Sign up or log in to comment