Spaces:

badayvedat
/

LLaVA

Running on T4

App Files Files Community

Load 13B model with 8-bit/4-bit quantization to support more hardwares

by liuhaotian - opened Oct 10, 2023

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+22

-5

Load 13B model with 8-bit/4-bit quantization to support more hardwares2043a675

liuhaotian

Oct 10, 2023

•

edited Oct 10, 2023

Hi, LLaVA author here. Thank you for contributing the Huggingface space.

It would be better to keep the model version consistent with the official demo (13B). Quantization can be used to support more hardwares, see discussion here.

I have updated the support for quantization and necessary instructions on controlling the quantization bits by the environment variable bits.

By default, it sets it to 8-bit to support running on A10G (this space). It can also be set to 4-bit to run on the smaller T4-medium (15G). The quantization bits for the current model will be indicated by the model name in the model selector dropdown.

Thanks.

You can load the model with 8-bit or 4-bit quantization to make it fit in smaller hardwares. Setting the environment variable bits to control the quantization.

Recommended configurations:

Hardware	Bits
A10G-Large (24G)	8 (default)
T4-Medium (15G)	4
A100-Large (40G)	16

liuhaotian

Oct 10, 2023

Also, please help add the environment variable here to better guide the users in where to set the environment variable bits.

badayvedat

Owner Oct 10, 2023

thank you @liuhaotian !
Previously, I also tried to use 4bits for it, but there is an issue stating that bitsandbytes is not configured correctly in the docker environment of the space. So, it was not possible to use it, did you have a chance to test with the changes in this PR?

Update requirements.txt45e69a67

liuhaotian

Oct 10, 2023

•

edited Oct 10, 2023

Yes I have tested that on a T4-medium here.

note the space is now compiling/downloading model as I am trying to see if we can skip the preload part, but it works before this debugging (which is the one I commited).

liuhaotian

Oct 10, 2023

Ah, also do you think the instructions above is taking too much of the vertical space? We may change that if it can be turned to look better.

Update app.py4e058355

badayvedat

Owner Oct 10, 2023

tbh, i also dislike the preload part due to:

very long build times
not being able to cache it

but I mainly did it so that if the Gradio app is launched, then there should be "a model". if we remove the preload part it will just work too, since the worker will be downloading it in the background.
however, the user will see an empty dropdown with no information about the download status, and that felt like bad UX (open to discuss potential solutions :)
Also: I tried the docker option I was able to cache the downloads, but it couldn't find cuda, so I gave up on that.

liuhaotian

Oct 10, 2023

it works now

https://huggingface.co/spaces/liuhaotian/LLaVA

liuhaotian changed pull request status to closed Oct 10, 2023

liuhaotian changed pull request status to open Oct 10, 2023

badayvedat

Owner Oct 10, 2023

•

edited Oct 10, 2023

Transposed version has more space but less intuitive, wdyt?

Recommended configurations:

Hardware	Bits
A10G-Large (24G)	8 (default)
T4-Medium (15G)	4
A100-Large (40G)	16

Hardware	A10G-Large (24G)	T4-Medium (15G)	A100-Large (40G)
Bits	8 (default)	4	16

Update app.py4ad10fb0

liuhaotian

Oct 10, 2023

it's looking great! updated the PR to adopt the transposed layout

badayvedat

Owner Oct 10, 2023

thanks!

badayvedat changed pull request status to merged Oct 10, 2023

liuhaotian

Oct 10, 2023

•

edited Oct 10, 2023

one bad thing about the preload..

after removing the preload, it works on the smallest t4-small ..

https://huggingface.co/spaces/liuhaotian/LLaVA

badayvedat

Owner Oct 10, 2023

•

edited Oct 10, 2023

wow, thanks for trying that!
I will take a look into the model dropdown component for a "downloading status" so that the user will know about the model downloading process. and after that, we can remove the preload.

liuhaotian

Oct 10, 2023

•

edited Oct 10, 2023

that sounds great, thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment