Load model into TGI

#27

by schauppi - opened Jul 27, 2023

Discussion

schauppi

Jul 27, 2023

•

edited Jul 27, 2023

Hello - Thx for the great work!

I want to load this model in the text generation inference v0.9.3 (latest) with 2x3090 24gb vRAM each. In this GitHub thread: https://github.com/huggingface/text-generation-inference they said it is not possible/will not fit?

You mentioned here https://huggingface.co/TheBloke/Llama-2-70B-chat-GPTQ/discussions/2#64ba51be41078fd9a059c1a6 that it would be possible.

Please could you guide me in the right direction running this model in TGI with my setup?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment