Model loading causes system RAM to spike and crash on Colab T4
#12
by
Sumi-AI
- opened
I tried running Deepseek-OCR on Google Colab with a free T4 GPU.
When loading the model, the system RAM usage suddenly spiked to the limit (12.7GB) and the session crashed.
Is this expected behavior?
Hello,
From this notebook: https://colab.research.google.com/drive/1zT5-1waOC7PUrn9OJhDMiJcqG6vWaB9u?usp=sharing
Replace the third cell in Google Colab by this code (you can remove the second cell as flash-attention2 isn't supported on T4) and it should work properly:
from transformers import AutoModel, AutoTokenizer
import torch
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '0'
model_name = 'deepseek-ai/DeepSeek-OCR'
if torch.cuda.is_available():
print(f"CUDA is available. Using GPU: {torch.cuda.get_device_name(0)}")
device = "cuda"
else:
print("CUDA is not available. Using CPU.")
device = "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(model_name, _attn_implementation='eager', trust_remote_code=True, use_safetensors=True, torch_dtype=torch.bfloat16, device_map=device)
model = model.eval()
Thank you so much! It worked perfectly and solved my issue.
Sumi-AI
changed discussion status to
closed