response generation too slow

by hussainwali1 - opened May 9, 2023

May 9, 2023

is there any way to speed up the generation? and it keeps on generating

Owner May 9, 2023

This is an unquantised model so it does require a lot of VRAM and does a lot of calculations.

If you have an NVidia GPU you could use a quantised model like https://huggingface.co/TheBloke/stable-vicuna-13B-GPTQ . That should run faster and need less VRAM.

How are you running the model?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment