jinaai/jina-embeddings-v2-base-en · Embeddings dimension

Dec 13, 2023

Hello,
I want first to thank you for the amazing project and for releasing the weights of the model.
I have a doubt

If I try to run the example code

from transformers import AutoModel

model = AutoModel.from_pretrained(
    "jinaai/jina-embeddings-v2-base-en", trust_remote_code=True
)  # trust_remote_code is needed to use the encode method
embeddings = model.encode(
    ["How is the weather today?", "What is the current weather like today?"]
)

print(embeddings.shape)

I get an output with shape (2, 768), while on the MTEB leaderboard the embeddings dimension is 512.
Connected with this, is there a reason why the pooling layer size is different from the hidden size of the model?

Thanks in advance for the help

bwang0911

Jina AI org Dec 13, 2023

hi @ggioetto thanks good catrch! Ill look into this, it is wired, the dimension should be 768, not sure how it is 512 on MTEB.

Our small outputs dim of 512, while base, due to larger model size, produce 768.

ttronrud

Dec 20, 2023

It's because your pooling config claims a word_embedding_dimension of 512. I'm not sure why, it doesn't appear to actually make the output 512, but it does seem to affect whatever automated system records model info.

Jackmin108

Dec 21, 2023

Thanks for pointing that out @ttronrud :)
Should be fixed by:
https://huggingface.co/jinaai/jina-embeddings-v2-base-en/commit/7aef14b0840b7dded6c7e4ce28ff87f16071284d

bwang0911 changed discussion status to closed Dec 22, 2023