AttentionMasks wrongly set with padding='longest'
#11
by
schwarzwalder
- opened
Hi Team,
I noticed an issue with attention masks returned by processor for batched inputs sequences. For padded tokens, the attention mask is set to 1 instead of 0. This behavior occurs when we set padding='longest'
and does not occur otherwise. Any thoughts ?
Please find the code below to reproduce with transformers==v4.36.2
.
import torch
from transformers import AutoProcessor
device = "cuda:1" if torch.cuda.is_available() else "cpu"
checkpoint = "HuggingFaceM4/idefics-9b-instruct"
processor = AutoProcessor.from_pretrained(checkpoint)
prompts = [
[
"User: What is in this image?",
"https://upload.wikimedia.org/wikipedia/commons/6/68/Orange_tabby_cat_sitting_on_fallen_leaves-Hisashi-01A.jpg",
],
[
"User: Is there a cat in the image ? Please answer yes or no.",
"https://upload.wikimedia.org/wikipedia/commons/6/68/Orange_tabby_cat_sitting_on_fallen_leaves-Hisashi-01A.jpg",
],
]
print(prompts)
# inputs = processor(prompts, return_tensors="pt", max_length=512, truncation=True, add_end_of_utterance_token=False).to(device)
inputs = processor(prompts, return_tensors="pt", max_length=512, truncation=True, padding='longest', add_end_of_utterance_token=False).to(device)
print(inputs['attention_mask'].shape)
print(inputs['attention_mask'])
Expected output:
torch.Size([2, 20])
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
device='cuda:1')
Actual output:
torch.Size([2, 20])
tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
device='cuda:1')
Hey!
thanks for the reproduction case, i can reproduce the problem.
i opened an issue on hf transformers: https://github.com/huggingface/transformers/issues/28591