allenai/Molmo-7B-D-0924 · Question about inputs in molmo

2U1

Sep 29

•

I'm writing a code for fine-tuining molmo and I have some question about the input of molmo.

Aren't there any seperator for the multi-turn conversations? When I preprocess the example input It looks like '<|endoftext|><im_start><im_patch><im_patch>...<im_col><im_end> User: Describe this image. Assistant:
What is the purpose of the image_mask? From the modelling_molmo.py it looks like for telling the model it is a padding in the image tensor. If it's right than, does this used same in the training phase?

Ai2 org Sep 30

•

Yes, we use the same token for BOS and EOS. If giving the model and example generation the model should produce, then that example should end with EOS.
The separators are just "User :" for user input and "Assistant:" for model output, for multi-turn conversations those prefixes should appear before each message.
That is correct, and the image_mask was used during training.

2U1

Oct 1

@chrisc36 Thanks for asnwering! It helped me for making a code for fine-tuning the model.

amanrangapur changed discussion status to closed 21 days ago