running it on cpu using pretrained

#35
by himanshuyadav62 - opened

from transformers import AutoTokenizer, AutoModelForCausalLM

can we use this to run model only on cpu

Google org

Yes, you can run the smaller Gemma models on CPU. Please make sure not to select the 'device_map' to GPU explicitly to run the model on CPU. You can also use the quantized version of the model to utilize the less memory. Please have a look at the gist for your reference where I run the Gemma2-2b-it model by selecting the CPU only in Google Colab.

Sign up or log in to comment