Expanding the maximum sequence length of CLIP past 77?
Hi! I have a prompt that has a length of 127. I'm getting the following message when running SD2:
"Token indices sequence length is longer than the specified maximum sequence length for this model (127 > 77). Running this sequence through the model will result in indexing errors
The following part of your input was truncated because CLIP can only handle sequences up to 77 tokens: [...]"
Looking at the CLIP documentation: https://huggingface.co/docs/transformers/model_doc/clip, it looks like max_position_embeddings is set to 77 by default. I'm wondering - is there a way to adjust this length for the SD2 inpainting model?
There are ways around this. One way is to process the text in chunks of 77 tokens, compute the embeddings using the text encoder, and the concatenate them along the sequence length dimension. You'd then need to pass the embeddings directly using the prompt_embed argument.