Use Chinese and English STS and NLI corpora to conduct contrastive learning finetuning on xlmr
Using HuggingFace Transformers
from transformers import AutoTokenizer, AutoModel
import torch
# Sentences we want sentence embeddings for
sentences = ["样例数据-1", "样例数据-2"]
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('zhou-xl/bi-cse')
model = AutoModel.from_pretrained('zhou-xl/bi-cse')
model.eval()
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling. In this case, cls pooling.
sentence_embeddings = model_output[0][:, 0]
# normalize embeddings
sentence_embeddings = torch.nn.functional.normalize(sentence_embeddings, p=2, dim=1)
print("Sentence embeddings:", sentence_embeddings)
- Downloads last month
- 395
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Spaces using zhou-xl/bi-cse 6
Evaluation results
- cos_sim_pearson on MTEB AFQMCvalidation set self-reported42.010
- cos_sim_spearman on MTEB AFQMCvalidation set self-reported43.449
- euclidean_pearson on MTEB AFQMCvalidation set self-reported41.933
- euclidean_spearman on MTEB AFQMCvalidation set self-reported43.457
- manhattan_pearson on MTEB AFQMCvalidation set self-reported41.930
- manhattan_spearman on MTEB AFQMCvalidation set self-reported43.445
- cos_sim_pearson on MTEB ATECtest set self-reported47.484
- cos_sim_spearman on MTEB ATECtest set self-reported48.010
- cos_sim_pearson on MTEB BIOSSEStest set self-reported70.066
- cos_sim_spearman on MTEB BIOSSEStest set self-reported70.564