Minimum gpu ram capacity

#77

by bob-sj - opened Aug 2

Aug 2

My laptop GPU is RTX 3070 Ti Laptop.
When I tried to run model. I got error killed. 50% of the time it progresses and then stops. What is the minimum capacity?

antony-pk

Aug 2

If you want to run the model in 4-bit quantization it should need 6GB of GPU.
If you want to fine-tune the model in 4bit quantization you should need at least 15GB GPU.
if you want to run the full model you should need at least 16GB GPU.

rkapuaala

Aug 29

I'm not sure which model you're running, I'll assume it is 3.1 8B instruct, because this is the community for that.
I haven't set it up a lap top of any kind,
but I have set it up on a windows 10 pc using a geforce GT 1030 GPU with 2 GB of GDDR, and I've set it up on Fedora server/ Fedora Workstation/Linux Mint Cinnamon/ Ubuntu with the same hardware.
It maybe too late to suggest, it's been 27 days, but before you assume you don't have enough memory --- from the specs I can find on the for your computer, it has 8 GB GDDR6,,, way more than I had -- and before you run it in 4-bit quantazation, you should try running on the cpu.
I also don't know what script you're running but whereever you can find the parameter 'device=' or 'device_mapping' change that value to 'cpu' instead of 'auto' or 'cuda', then try running the script and tell me what you see.

Robin-singh

Sep 21

I am running on 3 GPUs, each of 12GB , but still getting the out-of-memory error.
"CUDA out of memory"

rkapuaala

Sep 21

Did you try export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True in your virtual environment or the environment you're running it in?

rkapuaala

Sep 21

I think I've finally resolved the issue on my ubuntu servers. I have one GPU and I've been setting device_map='cuda' and it runs for a while as long as I keep the inferences simple, but anything more than 1400 characters long will crash the session with torch cuda out of memory errors: it always fails at "logits = logits.float()"
I finally set my device_map value to auto and now torch is using the cpu and systems memory along with the gpu and gpu memory is stead at 87% while it is processing input. GPU is 95% CPU 101% memory remains at 6.9 GB for systems ram.
I don't know if its a fluke. I will update if it ever crashes again.

rkapuaala

Sep 21

I spoke too soon. I already had logic in place to keep the session input to at or below 1400 characters in the script I was running. Adding the cpu just increased the threshold of characters to prevent a torch out of memory error from crashing the script. Never the less, it is a significant increase from 1400 characters to at around 10675. Anything over that amount of characters crashes the process with a torch out of memory error.
So, it sort of fixes the issue.

Robin-singh

Sep 30

it's working now, I deleted the model and redownload it, also able to use it on a single 24GB GPU easily by using custom device_map={"": "cuda:1"})

rkapuaala

Sep 30

That's great, is it an updated model?

Sanyam

Meta Llama org Sep 30

@rkapuaala the new models that were released in 3.2 are:
Multi-Modal: 11b and 90b
Featherlight: 1b and 3b

The 8b and 70b are the older ones, although if your GPU allows, 11b will give the exact same behaviour as 8b for text inference

rkapuaala

Sep 30

OHHH. So you're using the 3.2 not the 3.1! What is the size of 8B? I'm a little confused though, because I thought this was the community blog for 3.1 8B instruct. Is it for everything?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment