error model.generate()
error images:
code:
model_id = "google/gemma-7b-it"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
tokenizer = AutoTokenizer.from_pretrained(model_id, token=os.environ['HF_TOKEN'])
model = AutoModelForCausalLM.from_pretrained(model_id,
quantization_config=bnb_config,
device_map={"":0},
token=os.environ['HF_TOKEN'])
%%time
chat = [
{ "role": "user", "content": "Write a hello world program" },
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=250)
# Decode and print the output
text = tokenizer.batch_decode(outputs)[0]
print(text)
Facing the same Issue here
I'm running it on Colab T4GPU. Some how the gemma-2b-it is running but the 7b-it is throwing the above error
Same issue here with gemma-7b-it:
RuntimeError: shape '[1, 9, 3072]' is invalid for input of size 36864
And somehow, the model runs fine in Kaggle. I can use the gemma-7b-it in Kaggle but throwing the size error in Colab. On the flip side, the gemma-2b-it runs fine in Colab (but I donno how to control the output tokens generated. The generated response is not full but cut off in the middle, for example, for the question "Who are you?", the response I received was "I am a large language model, trained by Google. I am a")
Curiously, the gemma-2b-it model works correctly but the 7b-it and 7b base model does not.
google colab t4, v100 and a100 GPU no work.
Thanks all for reporting! I'm managing to reproduce using torch 2.1.0, but the error doesn't appear if I'm using torch 2.2.0.
Is it possible for you to share your torch version/upgrade it to 2.2.0 if not already the case and let us know if it helps?
Hey all! The source of the code is the difference in the attention implementation. Using any version before 2.1.1 will use eager
as sdpa
isn't supported in torch
in these versions. We will fix the models to work with these versions in transformers
ASAP and release a patch; but in the meantime, we recommend using a torch
version that satisfies torch>=2.1.1
in order to leverage the sdpa
attention implementation, which works correctly.
Here is the necessary line to install the relevant pytorch version in colab:
pip install "torch>=2.1.1" -U
Please restart your runtime afterwards for it to leverage the updated pytorch version!
https://huggingface.co/google/gemma-7b/discussions/17
Hey all! There's a PR to fix the "eager" attention in Transformers: https://github.com/huggingface/transformers/pull/29187. Once this is merged, we'll do a patch release and bump the latest PyPi version of Transformers to include this fix
cc @ArthurZ
Patch release is done! Thanks all for the prompt report, and sorry for not catching ! pip install -U transformers