Unable to load the model in 8 bits
Hi all, I have been trying to use this model on a laptop without any GPU for one of my course projects. Naturally, I am required to load this model in 8-bit quantization form. However, whenever I try to load it in a quantized state, I get an error stating that the accelerate and bits-and-bytes libraries are not present. I made to sure install those libraries in my virtual environment, yet the error persists. Please help me.
Here is the code that I have written:
from transformers import AutoModelForCausalLM, AutoTokenizer
import accelerate
import bitsandbytes
import gradio as gr
import torch
title = "????AI ChatBot"
description = "Quantised version of the Phi 1.5 LLM released by Microsoft research"
examples = [["How are you?"]]
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True, torch_dtype="auto")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", trust_remote_code=True, torch_dtype="auto", load_in_8bit = True)
def predict(input, history=[]):
# tokenize the new input sentence
new_user_input_ids = tokenizer.encode(
input + tokenizer.eos_token, return_tensors="pt"
)
# append the new user input tokens to the chat history
bot_input_ids = torch.cat([torch.LongTensor(history), new_user_input_ids], dim=-1)
# generate a response
history = model.generate(
bot_input_ids, max_length=4000, pad_token_id=tokenizer.eos_token_id
).tolist()
# convert the tokens to text, and then split the responses into lines
response = tokenizer.decode(history[0]).split("<|endoftext|>")
# print('decoded_response-->>'+str(response))
response = [
(response[i], response[i + 1]) for i in range(0, len(response) - 1, 2)
] # convert to tuples of list
# print('response-->>'+str(response))
return response, history
gr.Interface(
fn=predict,
title=title,
description=description,
examples=examples,
inputs=["text", "state"],
outputs=["chatbot", "state"],
theme="finlaymacklon/boxy_violet",
).launch()
Hello @ARahul2003 !
Your image still show an ImportError
, which could be related to an incomplete installation of either accelerate
or bitsandbytes
. However, please note that we haven't tested Phi-based models support with 8 bits, so I am unsure what will be its behavior.
Hello @ARahul2003 ,
You need to use older version of transformers.
!pip install -qU trl datasets accelerate loralib einops xformers bitsandbytes
!pip install transformers==4.30