Instructions to use Snowflake/snowflake-arctic-embed-l-v2.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Snowflake/snowflake-arctic-embed-l-v2.0 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Snowflake/snowflake-arctic-embed-l-v2.0") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers.js
How to use Snowflake/snowflake-arctic-embed-l-v2.0 with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('sentence-similarity', 'Snowflake/snowflake-arctic-embed-l-v2.0'); - Inference
- Notebooks
- Google Colab
- Kaggle
max_seq_length
What is the max_seq_length of this model?
https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0#using-huggingface-transformers
the large model code says max_length=512,
https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0#using-huggingface-transformers
but the medium model code says max_length=8192.
Is it right that their max_seq_lengths are different?
I believe this is correct, it's based on the maximum sequence lengths of the respective base models.
I believe this is correct, it's based on the maximum sequence lengths of the respective base models.
You mean 512?
Yes, large should have a maximum sequence length of 512 tokens, and medium a maximum sequence length of 8192. Folks from Snowflake should be able to confirm.
But when I run the following code:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Snowflake/snowflake-arctic-embed-l-v2.0")
print(model.max_seq_length)
I get 8192.
Oh, you're right. That's due to https://huggingface.co/Snowflake/snowflake-arctic-embed-l-v2.0/blob/main/tokenizer_config.json#L50
cc @spacemanidol @pxyu I'm pretty sure it's not possible for a XLM-RoBERTa finetune to exceed 512 tokens unless you've updated the positional embedding matrix.
Nevermind, looks like they can actually process ~6k tokens. These is the shape the token embeddings of 2 queries: torch.Size([2, 6005, 1024]). Perhaps the max. sequence length is actually 8192 - apologies for the confusion, I'll let the Snowflake team answer.
Both models handle 8192. We use the adjusted version of XMLR provided by the BGE team (BAAI/bge-m3-retromae), which has been extended for 8k context support, so the normal XMLR rules don't appl, haha. Let me get a fix in for the erroneous large model example code!
updated in README so closing.