Model loading causes system RAM to spike and crash on Colab T4

#12

by Sumi-AI - opened 28 days ago

28 days ago

I tried running Deepseek-OCR on Google Colab with a free T4 GPU.
When loading the model, the system RAM usage suddenly spiked to the limit (12.7GB) and the session crashed.
Is this expected behavior?

AdamCodd

28 days ago

•

edited 28 days ago

Hello,
From this notebook: https://colab.research.google.com/drive/1zT5-1waOC7PUrn9OJhDMiJcqG6vWaB9u?usp=sharing
Replace the third cell in Google Colab by this code (you can remove the second cell as flash-attention2 isn't supported on T4) and it should work properly:

from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
model_name = 'deepseek-ai/DeepSeek-OCR'

if torch.cuda.is_available():
    print(f"CUDA is available. Using GPU: {torch.cuda.get_device_name(0)}")
    device = "cuda"
else:
    print("CUDA is not available. Using CPU.")
    device = "cpu"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='eager', trust_remote_code=True, use_safetensors=True, torch_dtype=torch.bfloat16, device_map=device)
model = model.eval()

Sumi-AI

27 days ago

Thank you so much! It worked perfectly and solved my issue.

Sumi-AI changed discussion status to closed 27 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment