Load model into TGI
#27
by
schauppi
- opened
Hello - Thx for the great work!
I want to load this model in the text generation inference v0.9.3 (latest) with 2x3090 24gb vRAM each. In this GitHub thread: https://github.com/huggingface/text-generation-inference they said it is not possible/will not fit?
You mentioned here https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ/discussions/2#64ba51be41078fd9a059c1a6 that it would be possible.
Please could you guide me in the right direction running this model in TGI with my setup?