Manual training
Hello, for some reason running :
vlm = PaliGemmaForConditionalGeneration.from_pretrained(**llm_args)
pred = vlm(pixel_values=tensor, input_ids=input_ids[:, :-1],
attention_mask=torch.ones_like(input_ids[:, :-1])).logits
pred = pred[:, -nb_tokens_answer:]
loss = F.cross_entropy(pred.permute((0, 2, 1)), input_ids[:, -nb_tokens_answer:],
reduction='mean')
Gives me a very small loss. I have the feeling that input and target tokens were mixed.
Why is that ?
This is driving me crazy. This bugfix was supposed to solve my problem https://github.com/huggingface/transformers/pull/30967 ... (im checking on more data)
https://github.com/huggingface/transformers/issues/30993 ok I got help, apparently this models neededs also labels tokens and tokens type ids in input unlike imp or moondream...