How to load on a single A100 40GB
#18
by
mnwato
- opened
Hi. Anyone knows about the memory usage? Is there a way to load on a single A100 40GB?
I got it working with bitsandbytes 4bit. Here is how I did it: https://huggingface.co/tiiuae/falcon-40b/discussions/38#6479de427c18dca75e9a0903
Please use huggingface dev version 4.30-dev (downloaded from pip github) & accelerate 0.20-dev (from github too)
Then please use bitsandbytes package for using bfloat16, load_in_4bit, and quant_type=nf4.
mnwato
changed discussion status to
closed