The GluMLP is not working without flash attention, because the tensors are passed in a different shape. This PR fixes the issue. I also tested it that the embeddings with and without flash attentions are the same.

michael-guenther changed pull request status to open
Jina AI org

LGTM!

michael-guenther changed pull request status to merged
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment