Performance / speed of text generation

#19

by mattkallo - opened Jun 23, 2023

Jun 23, 2023

•

edited Jun 23, 2023

I am using A100 (40GB) to run this model (falcon-40b-instruct-GPTQ). It's taking roughly 120 seconds to answer a question with a limit of 200 output tokens. Is this expected? Whats the best performance seen so far? - Thanks

I am using the same code as whats given in the model card.

TheBloke

Owner Jun 23, 2023

Yeah I'm afraid that is expected with the Falcon GPTQ at the moment. It has a major speed problem that hasn't been resolved yet. I put a note in the README about it.

Recently we got preliminary support for GPU accelerated Falcon GGMLs. I have four repos for those. They perform quite a bit better than the GPTQ. Unfortunately they're not supported in many clients/UIs yet, but they did just get support in ctransformers (Python library including support for Langchain), and also LoLLMS-UI. So you may well find those preferable to the GPTQs.

Another option is to download the original unquantised model and then use load_in_4bit=True to use bitsandbytes. That's still very slow (maybe 4 tokens/s) and slower than the GGML, but it's faster than the GGML.

mattkallo

Jun 23, 2023

•

edited Jun 25, 2023

Thanks for the update.

Yeah I'm afraid that is expected with the Falcon GPTQ at the moment. It has a major speed problem that hasn't been resolved yet. I put a note in the README about it.

Recently we got preliminary support for GPU accelerated Falcon GGMLs. I have four repos for those. They perform quite a bit better than the GPTQ. Unfortunately they're not supported in many clients/UIs yet, but they did just get support in ctransformers (Python library including support for Langchain), and also LoLLMS-UI. So you may well find those preferable to the GPTQs.

Another option is to download the original unquantised model and then use load_in_4bit=True to use bitsandbytes. That's still very slow (maybe 4 tokens/s) and slower than the GGML, but it's faster than the GGML.

mattkallo changed discussion status to closed Jun 23, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment