Question about inputs in molmo
#13
by
2U1
- opened
I'm writing a code for fine-tuining molmo and I have some question about the input of molmo.
1.It looks like you are using <|endoftext|>
for bos, cuz qwen dosen't use bos. Then Are you using <|endoftext|>
for eos token too? When looking at the example code, it seems like the sequence should be ended with <|endoftext|>
.
Aren't there any seperator for the multi-turn conversations? When I preprocess the example input It looks like
'<|endoftext|><im_start><im_patch><im_patch>...<im_col><im_end> User: Describe this image. Assistant:
What is the purpose of the image_mask? From the
modelling_molmo.py
it looks like for telling the model it is a padding in the image tensor. If it's right than, does this used same in the training phase?
- Yes, we use the same token for BOS and EOS. If giving the model and example generation the model should produce, then that example should end with EOS.
- The separators are just "User :" for user input and "Assistant:" for model output, for multi-turn conversations those prefixes should appear before each message.
- That is correct, and the image_mask was used during training.
amanrangapur
changed discussion status to
closed