I'm Having trouble running inference on Colab
When trying the run this code :
Use a pipeline as a high-level helper
from transformers import pipeline
Use a pipeline as a high-level helper
from transformers import pipeline
Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("Trelis/Llama-2-7b-chat-hf-function-calling-GPTQ")
model = AutoModelForCausalLM.from_pretrained("Trelis/Llama-2-7b-chat-hf-function-calling-GPTQ")
I am receiving this error :
Trelis/Llama-2-7b-chat-hf-function-calling-GPTQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.
Howdy!
What packages have you installed? Also, this model is saved as safetensors.
Here is what I used:
model_name_or_path = "Trelis/Llama-2-7b-chat-hf-function-calling-GPTQ"
model_basename = "gptq_model-4bit-128g"
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
os.environ["SAFETENSORS_FAST_GPU"] = "1"
use_triton = False
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device="cuda:0",
use_triton=use_triton,
quantize_config=None)
You also need to pip install autogptq - which is fastest done using wheels. Check out this notebook for a full example.
great ! I appreciate your response.