danaaubakirova/mplugdocowl1.5-Omni-hf · Is this ready to be used with the transformers library?

Jul 13, 2024

Is this ready to be used with the transformers library?

If yes, please add examples and instructions regarding how the prompt should look like for multiple images.

Also, I noticed that the original training script has a crop_anchors argument and it's not clear to me what that's for and if I need to alter my training document images in any way before processing. Do you happen to know? In case you provide usage for training with the transformers library, please mention how to handle it.

Thank you!

Mihaiii

Jul 13, 2024

Ah, I found more info by searching the transformers repo: https://github.com/huggingface/transformers/pull/31792

danaaubakirova

Owner Jul 15, 2024

Hello Mihaiii,

No, you don't need to process the image in advance. You will be able to indicate it in an image_processor in the transformers, do_anchor_resize=True and do_adaptive_crop=True, which will prepare the crops for training. This part refers to the Shape Adaptive Cropping Module which was introduced in the original paper to deal with the images of various aspect ratios and resolution.
The model will be out soon in transformers.

Thanks for the interest!
Best,
Dana