Sentence Similarity
PEFT
Safetensors
English
text-embedding
embeddings
information-retrieval
beir
text-classification
language-model
text-clustering
text-semantic-similarity
text-evaluation
text-reranking
feature-extraction
Sentence Similarity
natural_questions
ms_marco
fever
hotpot_qa
mteb
Eval Results
library_name: peft | |
license: mit | |
language: | |
- en | |
pipeline_tag: sentence-similarity | |
tags: | |
- text-embedding | |
- embeddings | |
- information-retrieval | |
- beir | |
- text-classification | |
- language-model | |
- text-clustering | |
- text-semantic-similarity | |
- text-evaluation | |
- text-reranking | |
- feature-extraction | |
- sentence-similarity | |
- Sentence Similarity | |
- natural_questions | |
- ms_marco | |
- fever | |
- hotpot_qa | |
- mteb | |
model-index: | |
- name: LLM2Vec-Mistral-7B-unsupervised | |
results: | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_counterfactual | |
name: MTEB AmazonCounterfactualClassification (en) | |
config: en | |
split: test | |
revision: e8379541af4e31359cca9fbcf4b00f2671dba205 | |
metrics: | |
- type: accuracy | |
value: 76.94029850746269 | |
- type: ap | |
value: 41.01055096636703 | |
- type: f1 | |
value: 71.2582580801963 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_polarity | |
name: MTEB AmazonPolarityClassification | |
config: default | |
split: test | |
revision: e2d317d38cd51312af73b3d32a06d1a08b442046 | |
metrics: | |
- type: accuracy | |
value: 85.288275 | |
- type: ap | |
value: 80.9174293931393 | |
- type: f1 | |
value: 85.26284279319103 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_reviews_multi | |
name: MTEB AmazonReviewsClassification (en) | |
config: en | |
split: test | |
revision: 1399c76144fd37290681b995c656ef9b2e06e26d | |
metrics: | |
- type: accuracy | |
value: 47.089999999999996 | |
- type: f1 | |
value: 46.42571856588491 | |
- task: | |
type: Retrieval | |
dataset: | |
type: arguana | |
name: MTEB ArguAna | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 25.676 | |
- type: map_at_10 | |
value: 41.705999999999996 | |
- type: map_at_100 | |
value: 42.649 | |
- type: map_at_1000 | |
value: 42.655 | |
- type: map_at_3 | |
value: 36.214 | |
- type: map_at_5 | |
value: 39.475 | |
- type: mrr_at_1 | |
value: 26.173999999999996 | |
- type: mrr_at_10 | |
value: 41.873 | |
- type: mrr_at_100 | |
value: 42.817 | |
- type: mrr_at_1000 | |
value: 42.823 | |
- type: mrr_at_3 | |
value: 36.427 | |
- type: mrr_at_5 | |
value: 39.646 | |
- type: ndcg_at_1 | |
value: 25.676 | |
- type: ndcg_at_10 | |
value: 51.001 | |
- type: ndcg_at_100 | |
value: 55.001 | |
- type: ndcg_at_1000 | |
value: 55.167 | |
- type: ndcg_at_3 | |
value: 39.713 | |
- type: ndcg_at_5 | |
value: 45.596 | |
- type: precision_at_1 | |
value: 25.676 | |
- type: precision_at_10 | |
value: 8.087 | |
- type: precision_at_100 | |
value: 0.983 | |
- type: precision_at_1000 | |
value: 0.1 | |
- type: precision_at_3 | |
value: 16.619 | |
- type: precision_at_5 | |
value: 12.831000000000001 | |
- type: recall_at_1 | |
value: 25.676 | |
- type: recall_at_10 | |
value: 80.868 | |
- type: recall_at_100 | |
value: 98.29299999999999 | |
- type: recall_at_1000 | |
value: 99.57300000000001 | |
- type: recall_at_3 | |
value: 49.858000000000004 | |
- type: recall_at_5 | |
value: 64.154 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-p2p | |
name: MTEB ArxivClusteringP2P | |
config: default | |
split: test | |
revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d | |
metrics: | |
- type: v_measure | |
value: 47.557333278165295 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-s2s | |
name: MTEB ArxivClusteringS2S | |
config: default | |
split: test | |
revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 | |
metrics: | |
- type: v_measure | |
value: 39.921940994207674 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/askubuntudupquestions-reranking | |
name: MTEB AskUbuntuDupQuestions | |
config: default | |
split: test | |
revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 | |
metrics: | |
- type: map | |
value: 58.602773795071585 | |
- type: mrr | |
value: 72.93749725190169 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/biosses-sts | |
name: MTEB BIOSSES | |
config: default | |
split: test | |
revision: d3fb88f8f02e40887cd149695127462bbcf29b4a | |
metrics: | |
- type: cos_sim_spearman | |
value: 83.29045204631967 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/banking77 | |
name: MTEB Banking77Classification | |
config: default | |
split: test | |
revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 | |
metrics: | |
- type: accuracy | |
value: 86.1590909090909 | |
- type: f1 | |
value: 86.08993054539444 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-p2p | |
name: MTEB BiorxivClusteringP2P | |
config: default | |
split: test | |
revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 | |
metrics: | |
- type: v_measure | |
value: 36.13784714320738 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-s2s | |
name: MTEB BiorxivClusteringS2S | |
config: default | |
split: test | |
revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 | |
metrics: | |
- type: v_measure | |
value: 30.26284987791574 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/android | |
name: MTEB CQADupstackAndroidRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 27.611 | |
- type: map_at_10 | |
value: 37.838 | |
- type: map_at_100 | |
value: 39.446999999999996 | |
- type: map_at_1000 | |
value: 39.583 | |
- type: map_at_3 | |
value: 34.563 | |
- type: map_at_5 | |
value: 36.332 | |
- type: mrr_at_1 | |
value: 35.765 | |
- type: mrr_at_10 | |
value: 44.614 | |
- type: mrr_at_100 | |
value: 45.501000000000005 | |
- type: mrr_at_1000 | |
value: 45.558 | |
- type: mrr_at_3 | |
value: 42.513 | |
- type: mrr_at_5 | |
value: 43.515 | |
- type: ndcg_at_1 | |
value: 35.765 | |
- type: ndcg_at_10 | |
value: 44.104 | |
- type: ndcg_at_100 | |
value: 50.05500000000001 | |
- type: ndcg_at_1000 | |
value: 52.190000000000005 | |
- type: ndcg_at_3 | |
value: 39.834 | |
- type: ndcg_at_5 | |
value: 41.491 | |
- type: precision_at_1 | |
value: 35.765 | |
- type: precision_at_10 | |
value: 8.870000000000001 | |
- type: precision_at_100 | |
value: 1.505 | |
- type: precision_at_1000 | |
value: 0.2 | |
- type: precision_at_3 | |
value: 19.886 | |
- type: precision_at_5 | |
value: 14.277999999999999 | |
- type: recall_at_1 | |
value: 27.611 | |
- type: recall_at_10 | |
value: 55.065 | |
- type: recall_at_100 | |
value: 80.60199999999999 | |
- type: recall_at_1000 | |
value: 94.517 | |
- type: recall_at_3 | |
value: 41.281 | |
- type: recall_at_5 | |
value: 46.791 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/english | |
name: MTEB CQADupstackEnglishRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 28.599999999999998 | |
- type: map_at_10 | |
value: 38.218999999999994 | |
- type: map_at_100 | |
value: 39.336 | |
- type: map_at_1000 | |
value: 39.464 | |
- type: map_at_3 | |
value: 35.599 | |
- type: map_at_5 | |
value: 36.927 | |
- type: mrr_at_1 | |
value: 37.197 | |
- type: mrr_at_10 | |
value: 44.759 | |
- type: mrr_at_100 | |
value: 45.372 | |
- type: mrr_at_1000 | |
value: 45.422000000000004 | |
- type: mrr_at_3 | |
value: 42.941 | |
- type: mrr_at_5 | |
value: 43.906 | |
- type: ndcg_at_1 | |
value: 37.197 | |
- type: ndcg_at_10 | |
value: 43.689 | |
- type: ndcg_at_100 | |
value: 47.588 | |
- type: ndcg_at_1000 | |
value: 49.868 | |
- type: ndcg_at_3 | |
value: 40.434 | |
- type: ndcg_at_5 | |
value: 41.617 | |
- type: precision_at_1 | |
value: 37.197 | |
- type: precision_at_10 | |
value: 8.529 | |
- type: precision_at_100 | |
value: 1.325 | |
- type: precision_at_1000 | |
value: 0.181 | |
- type: precision_at_3 | |
value: 20.212 | |
- type: precision_at_5 | |
value: 13.987 | |
- type: recall_at_1 | |
value: 28.599999999999998 | |
- type: recall_at_10 | |
value: 52.266999999999996 | |
- type: recall_at_100 | |
value: 69.304 | |
- type: recall_at_1000 | |
value: 84.249 | |
- type: recall_at_3 | |
value: 41.281 | |
- type: recall_at_5 | |
value: 45.56 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gaming | |
name: MTEB CQADupstackGamingRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 33.168 | |
- type: map_at_10 | |
value: 44.690999999999995 | |
- type: map_at_100 | |
value: 45.804 | |
- type: map_at_1000 | |
value: 45.876 | |
- type: map_at_3 | |
value: 41.385 | |
- type: map_at_5 | |
value: 43.375 | |
- type: mrr_at_1 | |
value: 38.997 | |
- type: mrr_at_10 | |
value: 48.782 | |
- type: mrr_at_100 | |
value: 49.534 | |
- type: mrr_at_1000 | |
value: 49.57 | |
- type: mrr_at_3 | |
value: 46.134 | |
- type: mrr_at_5 | |
value: 47.814 | |
- type: ndcg_at_1 | |
value: 38.997 | |
- type: ndcg_at_10 | |
value: 50.707 | |
- type: ndcg_at_100 | |
value: 55.358 | |
- type: ndcg_at_1000 | |
value: 56.818999999999996 | |
- type: ndcg_at_3 | |
value: 45.098 | |
- type: ndcg_at_5 | |
value: 48.065999999999995 | |
- type: precision_at_1 | |
value: 38.997 | |
- type: precision_at_10 | |
value: 8.414000000000001 | |
- type: precision_at_100 | |
value: 1.159 | |
- type: precision_at_1000 | |
value: 0.135 | |
- type: precision_at_3 | |
value: 20.564 | |
- type: precision_at_5 | |
value: 14.445 | |
- type: recall_at_1 | |
value: 33.168 | |
- type: recall_at_10 | |
value: 64.595 | |
- type: recall_at_100 | |
value: 85.167 | |
- type: recall_at_1000 | |
value: 95.485 | |
- type: recall_at_3 | |
value: 49.555 | |
- type: recall_at_5 | |
value: 56.871 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gis | |
name: MTEB CQADupstackGisRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 17.254 | |
- type: map_at_10 | |
value: 23.925 | |
- type: map_at_100 | |
value: 25.008000000000003 | |
- type: map_at_1000 | |
value: 25.123 | |
- type: map_at_3 | |
value: 21.676000000000002 | |
- type: map_at_5 | |
value: 23.042 | |
- type: mrr_at_1 | |
value: 18.756999999999998 | |
- type: mrr_at_10 | |
value: 25.578 | |
- type: mrr_at_100 | |
value: 26.594 | |
- type: mrr_at_1000 | |
value: 26.680999999999997 | |
- type: mrr_at_3 | |
value: 23.371 | |
- type: mrr_at_5 | |
value: 24.721 | |
- type: ndcg_at_1 | |
value: 18.756999999999998 | |
- type: ndcg_at_10 | |
value: 27.878999999999998 | |
- type: ndcg_at_100 | |
value: 33.285 | |
- type: ndcg_at_1000 | |
value: 36.333 | |
- type: ndcg_at_3 | |
value: 23.461000000000002 | |
- type: ndcg_at_5 | |
value: 25.836 | |
- type: precision_at_1 | |
value: 18.756999999999998 | |
- type: precision_at_10 | |
value: 4.429 | |
- type: precision_at_100 | |
value: 0.754 | |
- type: precision_at_1000 | |
value: 0.106 | |
- type: precision_at_3 | |
value: 9.981 | |
- type: precision_at_5 | |
value: 7.412000000000001 | |
- type: recall_at_1 | |
value: 17.254 | |
- type: recall_at_10 | |
value: 38.42 | |
- type: recall_at_100 | |
value: 63.50900000000001 | |
- type: recall_at_1000 | |
value: 86.787 | |
- type: recall_at_3 | |
value: 26.840999999999998 | |
- type: recall_at_5 | |
value: 32.4 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/mathematica | |
name: MTEB CQADupstackMathematicaRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 10.495000000000001 | |
- type: map_at_10 | |
value: 16.505 | |
- type: map_at_100 | |
value: 17.59 | |
- type: map_at_1000 | |
value: 17.709 | |
- type: map_at_3 | |
value: 13.974 | |
- type: map_at_5 | |
value: 15.466 | |
- type: mrr_at_1 | |
value: 14.179 | |
- type: mrr_at_10 | |
value: 20.396 | |
- type: mrr_at_100 | |
value: 21.384 | |
- type: mrr_at_1000 | |
value: 21.47 | |
- type: mrr_at_3 | |
value: 17.828 | |
- type: mrr_at_5 | |
value: 19.387999999999998 | |
- type: ndcg_at_1 | |
value: 14.179 | |
- type: ndcg_at_10 | |
value: 20.852 | |
- type: ndcg_at_100 | |
value: 26.44 | |
- type: ndcg_at_1000 | |
value: 29.448999999999998 | |
- type: ndcg_at_3 | |
value: 16.181 | |
- type: ndcg_at_5 | |
value: 18.594 | |
- type: precision_at_1 | |
value: 14.179 | |
- type: precision_at_10 | |
value: 4.229 | |
- type: precision_at_100 | |
value: 0.8170000000000001 | |
- type: precision_at_1000 | |
value: 0.12 | |
- type: precision_at_3 | |
value: 8.126 | |
- type: precision_at_5 | |
value: 6.493 | |
- type: recall_at_1 | |
value: 10.495000000000001 | |
- type: recall_at_10 | |
value: 30.531000000000002 | |
- type: recall_at_100 | |
value: 55.535999999999994 | |
- type: recall_at_1000 | |
value: 77.095 | |
- type: recall_at_3 | |
value: 17.805 | |
- type: recall_at_5 | |
value: 24.041 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/physics | |
name: MTEB CQADupstackPhysicsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 24.826999999999998 | |
- type: map_at_10 | |
value: 34.957 | |
- type: map_at_100 | |
value: 36.314 | |
- type: map_at_1000 | |
value: 36.437999999999995 | |
- type: map_at_3 | |
value: 31.328 | |
- type: map_at_5 | |
value: 33.254 | |
- type: mrr_at_1 | |
value: 31.375999999999998 | |
- type: mrr_at_10 | |
value: 40.493 | |
- type: mrr_at_100 | |
value: 41.410000000000004 | |
- type: mrr_at_1000 | |
value: 41.46 | |
- type: mrr_at_3 | |
value: 37.504 | |
- type: mrr_at_5 | |
value: 39.212 | |
- type: ndcg_at_1 | |
value: 31.375999999999998 | |
- type: ndcg_at_10 | |
value: 41.285 | |
- type: ndcg_at_100 | |
value: 46.996 | |
- type: ndcg_at_1000 | |
value: 49.207 | |
- type: ndcg_at_3 | |
value: 35.297 | |
- type: ndcg_at_5 | |
value: 37.999 | |
- type: precision_at_1 | |
value: 31.375999999999998 | |
- type: precision_at_10 | |
value: 7.960000000000001 | |
- type: precision_at_100 | |
value: 1.277 | |
- type: precision_at_1000 | |
value: 0.165 | |
- type: precision_at_3 | |
value: 17.132 | |
- type: precision_at_5 | |
value: 12.57 | |
- type: recall_at_1 | |
value: 24.826999999999998 | |
- type: recall_at_10 | |
value: 54.678000000000004 | |
- type: recall_at_100 | |
value: 78.849 | |
- type: recall_at_1000 | |
value: 93.36 | |
- type: recall_at_3 | |
value: 37.775 | |
- type: recall_at_5 | |
value: 44.993 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/programmers | |
name: MTEB CQADupstackProgrammersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 21.195 | |
- type: map_at_10 | |
value: 29.003 | |
- type: map_at_100 | |
value: 30.379 | |
- type: map_at_1000 | |
value: 30.508000000000003 | |
- type: map_at_3 | |
value: 25.927 | |
- type: map_at_5 | |
value: 27.784 | |
- type: mrr_at_1 | |
value: 26.941 | |
- type: mrr_at_10 | |
value: 34.305 | |
- type: mrr_at_100 | |
value: 35.32 | |
- type: mrr_at_1000 | |
value: 35.386 | |
- type: mrr_at_3 | |
value: 31.735000000000003 | |
- type: mrr_at_5 | |
value: 33.213 | |
- type: ndcg_at_1 | |
value: 26.941 | |
- type: ndcg_at_10 | |
value: 34.31 | |
- type: ndcg_at_100 | |
value: 40.242 | |
- type: ndcg_at_1000 | |
value: 42.9 | |
- type: ndcg_at_3 | |
value: 29.198 | |
- type: ndcg_at_5 | |
value: 31.739 | |
- type: precision_at_1 | |
value: 26.941 | |
- type: precision_at_10 | |
value: 6.507000000000001 | |
- type: precision_at_100 | |
value: 1.124 | |
- type: precision_at_1000 | |
value: 0.154 | |
- type: precision_at_3 | |
value: 13.850999999999999 | |
- type: precision_at_5 | |
value: 10.411 | |
- type: recall_at_1 | |
value: 21.195 | |
- type: recall_at_10 | |
value: 45.083 | |
- type: recall_at_100 | |
value: 70.14200000000001 | |
- type: recall_at_1000 | |
value: 88.34100000000001 | |
- type: recall_at_3 | |
value: 31.175000000000004 | |
- type: recall_at_5 | |
value: 37.625 | |
- task: | |
type: Retrieval | |
dataset: | |
type: mteb/cqadupstack | |
name: MTEB CQADupstackRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 20.293916666666664 | |
- type: map_at_10 | |
value: 28.353666666666665 | |
- type: map_at_100 | |
value: 29.524333333333335 | |
- type: map_at_1000 | |
value: 29.652583333333332 | |
- type: map_at_3 | |
value: 25.727916666666665 | |
- type: map_at_5 | |
value: 27.170833333333334 | |
- type: mrr_at_1 | |
value: 25.21375 | |
- type: mrr_at_10 | |
value: 32.67591666666667 | |
- type: mrr_at_100 | |
value: 33.56233333333334 | |
- type: mrr_at_1000 | |
value: 33.63283333333334 | |
- type: mrr_at_3 | |
value: 30.415333333333333 | |
- type: mrr_at_5 | |
value: 31.679583333333333 | |
- type: ndcg_at_1 | |
value: 25.21375 | |
- type: ndcg_at_10 | |
value: 33.37108333333333 | |
- type: ndcg_at_100 | |
value: 38.57725 | |
- type: ndcg_at_1000 | |
value: 41.246833333333335 | |
- type: ndcg_at_3 | |
value: 28.98183333333334 | |
- type: ndcg_at_5 | |
value: 30.986083333333337 | |
- type: precision_at_1 | |
value: 25.21375 | |
- type: precision_at_10 | |
value: 6.200583333333333 | |
- type: precision_at_100 | |
value: 1.0527499999999999 | |
- type: precision_at_1000 | |
value: 0.14675000000000002 | |
- type: precision_at_3 | |
value: 13.808333333333334 | |
- type: precision_at_5 | |
value: 9.976416666666669 | |
- type: recall_at_1 | |
value: 20.293916666666664 | |
- type: recall_at_10 | |
value: 43.90833333333333 | |
- type: recall_at_100 | |
value: 67.26575 | |
- type: recall_at_1000 | |
value: 86.18591666666666 | |
- type: recall_at_3 | |
value: 31.387416666666667 | |
- type: recall_at_5 | |
value: 36.73883333333333 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/stats | |
name: MTEB CQADupstackStatsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 15.043000000000001 | |
- type: map_at_10 | |
value: 22.203 | |
- type: map_at_100 | |
value: 23.254 | |
- type: map_at_1000 | |
value: 23.362 | |
- type: map_at_3 | |
value: 20.157 | |
- type: map_at_5 | |
value: 21.201999999999998 | |
- type: mrr_at_1 | |
value: 17.485 | |
- type: mrr_at_10 | |
value: 24.729 | |
- type: mrr_at_100 | |
value: 25.715 | |
- type: mrr_at_1000 | |
value: 25.796999999999997 | |
- type: mrr_at_3 | |
value: 22.725 | |
- type: mrr_at_5 | |
value: 23.829 | |
- type: ndcg_at_1 | |
value: 17.485 | |
- type: ndcg_at_10 | |
value: 26.31 | |
- type: ndcg_at_100 | |
value: 31.722 | |
- type: ndcg_at_1000 | |
value: 34.621 | |
- type: ndcg_at_3 | |
value: 22.414 | |
- type: ndcg_at_5 | |
value: 24.125 | |
- type: precision_at_1 | |
value: 17.485 | |
- type: precision_at_10 | |
value: 4.601 | |
- type: precision_at_100 | |
value: 0.7849999999999999 | |
- type: precision_at_1000 | |
value: 0.11100000000000002 | |
- type: precision_at_3 | |
value: 10.327 | |
- type: precision_at_5 | |
value: 7.331 | |
- type: recall_at_1 | |
value: 15.043000000000001 | |
- type: recall_at_10 | |
value: 36.361 | |
- type: recall_at_100 | |
value: 61.63999999999999 | |
- type: recall_at_1000 | |
value: 83.443 | |
- type: recall_at_3 | |
value: 25.591 | |
- type: recall_at_5 | |
value: 29.808 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/tex | |
name: MTEB CQADupstackTexRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 11.018 | |
- type: map_at_10 | |
value: 15.886 | |
- type: map_at_100 | |
value: 16.830000000000002 | |
- type: map_at_1000 | |
value: 16.956 | |
- type: map_at_3 | |
value: 14.222000000000001 | |
- type: map_at_5 | |
value: 15.110999999999999 | |
- type: mrr_at_1 | |
value: 14.625 | |
- type: mrr_at_10 | |
value: 19.677 | |
- type: mrr_at_100 | |
value: 20.532 | |
- type: mrr_at_1000 | |
value: 20.622 | |
- type: mrr_at_3 | |
value: 17.992 | |
- type: mrr_at_5 | |
value: 18.909000000000002 | |
- type: ndcg_at_1 | |
value: 14.625 | |
- type: ndcg_at_10 | |
value: 19.414 | |
- type: ndcg_at_100 | |
value: 24.152 | |
- type: ndcg_at_1000 | |
value: 27.433000000000003 | |
- type: ndcg_at_3 | |
value: 16.495 | |
- type: ndcg_at_5 | |
value: 17.742 | |
- type: precision_at_1 | |
value: 14.625 | |
- type: precision_at_10 | |
value: 3.833 | |
- type: precision_at_100 | |
value: 0.744 | |
- type: precision_at_1000 | |
value: 0.11900000000000001 | |
- type: precision_at_3 | |
value: 8.213 | |
- type: precision_at_5 | |
value: 6.036 | |
- type: recall_at_1 | |
value: 11.018 | |
- type: recall_at_10 | |
value: 26.346000000000004 | |
- type: recall_at_100 | |
value: 47.99 | |
- type: recall_at_1000 | |
value: 72.002 | |
- type: recall_at_3 | |
value: 17.762 | |
- type: recall_at_5 | |
value: 21.249000000000002 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/unix | |
name: MTEB CQADupstackUnixRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 20.053 | |
- type: map_at_10 | |
value: 27.950000000000003 | |
- type: map_at_100 | |
value: 29.207 | |
- type: map_at_1000 | |
value: 29.309 | |
- type: map_at_3 | |
value: 25.612000000000002 | |
- type: map_at_5 | |
value: 26.793 | |
- type: mrr_at_1 | |
value: 24.813 | |
- type: mrr_at_10 | |
value: 32.297 | |
- type: mrr_at_100 | |
value: 33.312999999999995 | |
- type: mrr_at_1000 | |
value: 33.379999999999995 | |
- type: mrr_at_3 | |
value: 30.239 | |
- type: mrr_at_5 | |
value: 31.368000000000002 | |
- type: ndcg_at_1 | |
value: 24.813 | |
- type: ndcg_at_10 | |
value: 32.722 | |
- type: ndcg_at_100 | |
value: 38.603 | |
- type: ndcg_at_1000 | |
value: 41.11 | |
- type: ndcg_at_3 | |
value: 28.74 | |
- type: ndcg_at_5 | |
value: 30.341 | |
- type: precision_at_1 | |
value: 24.813 | |
- type: precision_at_10 | |
value: 5.83 | |
- type: precision_at_100 | |
value: 0.9860000000000001 | |
- type: precision_at_1000 | |
value: 0.13 | |
- type: precision_at_3 | |
value: 13.433 | |
- type: precision_at_5 | |
value: 9.384 | |
- type: recall_at_1 | |
value: 20.053 | |
- type: recall_at_10 | |
value: 42.867 | |
- type: recall_at_100 | |
value: 68.90899999999999 | |
- type: recall_at_1000 | |
value: 87.031 | |
- type: recall_at_3 | |
value: 31.606 | |
- type: recall_at_5 | |
value: 35.988 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/webmasters | |
name: MTEB CQADupstackWebmastersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 20.696 | |
- type: map_at_10 | |
value: 29.741 | |
- type: map_at_100 | |
value: 30.958999999999996 | |
- type: map_at_1000 | |
value: 31.22 | |
- type: map_at_3 | |
value: 26.679000000000002 | |
- type: map_at_5 | |
value: 28.244999999999997 | |
- type: mrr_at_1 | |
value: 27.272999999999996 | |
- type: mrr_at_10 | |
value: 35.101 | |
- type: mrr_at_100 | |
value: 35.91 | |
- type: mrr_at_1000 | |
value: 35.987 | |
- type: mrr_at_3 | |
value: 32.378 | |
- type: mrr_at_5 | |
value: 33.732 | |
- type: ndcg_at_1 | |
value: 27.272999999999996 | |
- type: ndcg_at_10 | |
value: 36.136 | |
- type: ndcg_at_100 | |
value: 40.9 | |
- type: ndcg_at_1000 | |
value: 44.184 | |
- type: ndcg_at_3 | |
value: 31.123 | |
- type: ndcg_at_5 | |
value: 33.182 | |
- type: precision_at_1 | |
value: 27.272999999999996 | |
- type: precision_at_10 | |
value: 7.489999999999999 | |
- type: precision_at_100 | |
value: 1.506 | |
- type: precision_at_1000 | |
value: 0.24 | |
- type: precision_at_3 | |
value: 15.348999999999998 | |
- type: precision_at_5 | |
value: 11.344 | |
- type: recall_at_1 | |
value: 20.696 | |
- type: recall_at_10 | |
value: 48.041 | |
- type: recall_at_100 | |
value: 71.316 | |
- type: recall_at_1000 | |
value: 92.794 | |
- type: recall_at_3 | |
value: 32.983000000000004 | |
- type: recall_at_5 | |
value: 38.627 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/wordpress | |
name: MTEB CQADupstackWordpressRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 13.567000000000002 | |
- type: map_at_10 | |
value: 19.326 | |
- type: map_at_100 | |
value: 20.164 | |
- type: map_at_1000 | |
value: 20.283 | |
- type: map_at_3 | |
value: 17.613 | |
- type: map_at_5 | |
value: 18.519 | |
- type: mrr_at_1 | |
value: 15.157000000000002 | |
- type: mrr_at_10 | |
value: 21.38 | |
- type: mrr_at_100 | |
value: 22.163 | |
- type: mrr_at_1000 | |
value: 22.261 | |
- type: mrr_at_3 | |
value: 19.624 | |
- type: mrr_at_5 | |
value: 20.548 | |
- type: ndcg_at_1 | |
value: 15.157000000000002 | |
- type: ndcg_at_10 | |
value: 23.044999999999998 | |
- type: ndcg_at_100 | |
value: 27.586 | |
- type: ndcg_at_1000 | |
value: 30.848 | |
- type: ndcg_at_3 | |
value: 19.506999999999998 | |
- type: ndcg_at_5 | |
value: 21.101 | |
- type: precision_at_1 | |
value: 15.157000000000002 | |
- type: precision_at_10 | |
value: 3.7150000000000003 | |
- type: precision_at_100 | |
value: 0.651 | |
- type: precision_at_1000 | |
value: 0.1 | |
- type: precision_at_3 | |
value: 8.626000000000001 | |
- type: precision_at_5 | |
value: 6.026 | |
- type: recall_at_1 | |
value: 13.567000000000002 | |
- type: recall_at_10 | |
value: 32.646 | |
- type: recall_at_100 | |
value: 54.225 | |
- type: recall_at_1000 | |
value: 79.12700000000001 | |
- type: recall_at_3 | |
value: 22.994 | |
- type: recall_at_5 | |
value: 26.912999999999997 | |
- task: | |
type: Retrieval | |
dataset: | |
type: climate-fever | |
name: MTEB ClimateFEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 7.26 | |
- type: map_at_10 | |
value: 15.109 | |
- type: map_at_100 | |
value: 17.155 | |
- type: map_at_1000 | |
value: 17.354 | |
- type: map_at_3 | |
value: 11.772 | |
- type: map_at_5 | |
value: 13.542000000000002 | |
- type: mrr_at_1 | |
value: 16.678 | |
- type: mrr_at_10 | |
value: 29.470000000000002 | |
- type: mrr_at_100 | |
value: 30.676 | |
- type: mrr_at_1000 | |
value: 30.714999999999996 | |
- type: mrr_at_3 | |
value: 25.44 | |
- type: mrr_at_5 | |
value: 27.792 | |
- type: ndcg_at_1 | |
value: 16.678 | |
- type: ndcg_at_10 | |
value: 22.967000000000002 | |
- type: ndcg_at_100 | |
value: 31.253999999999998 | |
- type: ndcg_at_1000 | |
value: 34.748000000000005 | |
- type: ndcg_at_3 | |
value: 17.058 | |
- type: ndcg_at_5 | |
value: 19.43 | |
- type: precision_at_1 | |
value: 16.678 | |
- type: precision_at_10 | |
value: 7.974 | |
- type: precision_at_100 | |
value: 1.6740000000000002 | |
- type: precision_at_1000 | |
value: 0.232 | |
- type: precision_at_3 | |
value: 13.681 | |
- type: precision_at_5 | |
value: 11.322000000000001 | |
- type: recall_at_1 | |
value: 7.26 | |
- type: recall_at_10 | |
value: 30.407 | |
- type: recall_at_100 | |
value: 59.073 | |
- type: recall_at_1000 | |
value: 78.58800000000001 | |
- type: recall_at_3 | |
value: 16.493 | |
- type: recall_at_5 | |
value: 22.453 | |
- task: | |
type: Retrieval | |
dataset: | |
type: dbpedia-entity | |
name: MTEB DBPedia | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 5.176 | |
- type: map_at_10 | |
value: 11.951 | |
- type: map_at_100 | |
value: 16.208 | |
- type: map_at_1000 | |
value: 17.067 | |
- type: map_at_3 | |
value: 8.669 | |
- type: map_at_5 | |
value: 10.061 | |
- type: mrr_at_1 | |
value: 42.5 | |
- type: mrr_at_10 | |
value: 54.312000000000005 | |
- type: mrr_at_100 | |
value: 54.925999999999995 | |
- type: mrr_at_1000 | |
value: 54.959 | |
- type: mrr_at_3 | |
value: 52.292 | |
- type: mrr_at_5 | |
value: 53.554 | |
- type: ndcg_at_1 | |
value: 31.374999999999996 | |
- type: ndcg_at_10 | |
value: 25.480999999999998 | |
- type: ndcg_at_100 | |
value: 30.018 | |
- type: ndcg_at_1000 | |
value: 36.103 | |
- type: ndcg_at_3 | |
value: 27.712999999999997 | |
- type: ndcg_at_5 | |
value: 26.415 | |
- type: precision_at_1 | |
value: 42.5 | |
- type: precision_at_10 | |
value: 20.549999999999997 | |
- type: precision_at_100 | |
value: 6.387 | |
- type: precision_at_1000 | |
value: 1.204 | |
- type: precision_at_3 | |
value: 32.917 | |
- type: precision_at_5 | |
value: 27.400000000000002 | |
- type: recall_at_1 | |
value: 5.176 | |
- type: recall_at_10 | |
value: 18.335 | |
- type: recall_at_100 | |
value: 38.629999999999995 | |
- type: recall_at_1000 | |
value: 59.74699999999999 | |
- type: recall_at_3 | |
value: 10.36 | |
- type: recall_at_5 | |
value: 13.413 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/emotion | |
name: MTEB EmotionClassification | |
config: default | |
split: test | |
revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 | |
metrics: | |
- type: accuracy | |
value: 48.885 | |
- type: f1 | |
value: 44.330258440550644 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fever | |
name: MTEB FEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 25.211 | |
- type: map_at_10 | |
value: 37.946999999999996 | |
- type: map_at_100 | |
value: 38.852 | |
- type: map_at_1000 | |
value: 38.896 | |
- type: map_at_3 | |
value: 34.445 | |
- type: map_at_5 | |
value: 36.451 | |
- type: mrr_at_1 | |
value: 27.453 | |
- type: mrr_at_10 | |
value: 40.505 | |
- type: mrr_at_100 | |
value: 41.342 | |
- type: mrr_at_1000 | |
value: 41.377 | |
- type: mrr_at_3 | |
value: 36.971 | |
- type: mrr_at_5 | |
value: 39.013999999999996 | |
- type: ndcg_at_1 | |
value: 27.453 | |
- type: ndcg_at_10 | |
value: 45.106 | |
- type: ndcg_at_100 | |
value: 49.357 | |
- type: ndcg_at_1000 | |
value: 50.546 | |
- type: ndcg_at_3 | |
value: 38.063 | |
- type: ndcg_at_5 | |
value: 41.603 | |
- type: precision_at_1 | |
value: 27.453 | |
- type: precision_at_10 | |
value: 7.136000000000001 | |
- type: precision_at_100 | |
value: 0.9390000000000001 | |
- type: precision_at_1000 | |
value: 0.106 | |
- type: precision_at_3 | |
value: 16.677 | |
- type: precision_at_5 | |
value: 11.899 | |
- type: recall_at_1 | |
value: 25.211 | |
- type: recall_at_10 | |
value: 64.964 | |
- type: recall_at_100 | |
value: 84.23 | |
- type: recall_at_1000 | |
value: 93.307 | |
- type: recall_at_3 | |
value: 45.936 | |
- type: recall_at_5 | |
value: 54.489 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fiqa | |
name: MTEB FiQA2018 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 11.434 | |
- type: map_at_10 | |
value: 20.325 | |
- type: map_at_100 | |
value: 22.267 | |
- type: map_at_1000 | |
value: 22.46 | |
- type: map_at_3 | |
value: 16.864 | |
- type: map_at_5 | |
value: 18.584999999999997 | |
- type: mrr_at_1 | |
value: 24.074 | |
- type: mrr_at_10 | |
value: 32.487 | |
- type: mrr_at_100 | |
value: 33.595000000000006 | |
- type: mrr_at_1000 | |
value: 33.649 | |
- type: mrr_at_3 | |
value: 29.578 | |
- type: mrr_at_5 | |
value: 31.044 | |
- type: ndcg_at_1 | |
value: 24.074 | |
- type: ndcg_at_10 | |
value: 27.244 | |
- type: ndcg_at_100 | |
value: 35.244 | |
- type: ndcg_at_1000 | |
value: 38.964999999999996 | |
- type: ndcg_at_3 | |
value: 22.709 | |
- type: ndcg_at_5 | |
value: 24.114 | |
- type: precision_at_1 | |
value: 24.074 | |
- type: precision_at_10 | |
value: 8.21 | |
- type: precision_at_100 | |
value: 1.627 | |
- type: precision_at_1000 | |
value: 0.22999999999999998 | |
- type: precision_at_3 | |
value: 15.741 | |
- type: precision_at_5 | |
value: 12.037 | |
- type: recall_at_1 | |
value: 11.434 | |
- type: recall_at_10 | |
value: 35.423 | |
- type: recall_at_100 | |
value: 66.056 | |
- type: recall_at_1000 | |
value: 88.63799999999999 | |
- type: recall_at_3 | |
value: 20.968 | |
- type: recall_at_5 | |
value: 26.540999999999997 | |
- task: | |
type: Retrieval | |
dataset: | |
type: hotpotqa | |
name: MTEB HotpotQA | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 30.506 | |
- type: map_at_10 | |
value: 44.864 | |
- type: map_at_100 | |
value: 46.016 | |
- type: map_at_1000 | |
value: 46.1 | |
- type: map_at_3 | |
value: 41.518 | |
- type: map_at_5 | |
value: 43.461 | |
- type: mrr_at_1 | |
value: 61.013 | |
- type: mrr_at_10 | |
value: 69.918 | |
- type: mrr_at_100 | |
value: 70.327 | |
- type: mrr_at_1000 | |
value: 70.342 | |
- type: mrr_at_3 | |
value: 68.226 | |
- type: mrr_at_5 | |
value: 69.273 | |
- type: ndcg_at_1 | |
value: 61.013 | |
- type: ndcg_at_10 | |
value: 54.539 | |
- type: ndcg_at_100 | |
value: 58.819 | |
- type: ndcg_at_1000 | |
value: 60.473 | |
- type: ndcg_at_3 | |
value: 49.27 | |
- type: ndcg_at_5 | |
value: 51.993 | |
- type: precision_at_1 | |
value: 61.013 | |
- type: precision_at_10 | |
value: 11.757 | |
- type: precision_at_100 | |
value: 1.5110000000000001 | |
- type: precision_at_1000 | |
value: 0.173 | |
- type: precision_at_3 | |
value: 31.339 | |
- type: precision_at_5 | |
value: 20.959 | |
- type: recall_at_1 | |
value: 30.506 | |
- type: recall_at_10 | |
value: 58.785 | |
- type: recall_at_100 | |
value: 75.55 | |
- type: recall_at_1000 | |
value: 86.455 | |
- type: recall_at_3 | |
value: 47.009 | |
- type: recall_at_5 | |
value: 52.397000000000006 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/imdb | |
name: MTEB ImdbClassification | |
config: default | |
split: test | |
revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 | |
metrics: | |
- type: accuracy | |
value: 77.954 | |
- type: ap | |
value: 73.06067313842448 | |
- type: f1 | |
value: 77.6469083443121 | |
- task: | |
type: Retrieval | |
dataset: | |
type: msmarco | |
name: MTEB MSMARCO | |
config: default | |
split: dev | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 7.7170000000000005 | |
- type: map_at_10 | |
value: 14.696000000000002 | |
- type: map_at_100 | |
value: 15.973 | |
- type: map_at_1000 | |
value: 16.079 | |
- type: map_at_3 | |
value: 12.059000000000001 | |
- type: map_at_5 | |
value: 13.478000000000002 | |
- type: mrr_at_1 | |
value: 7.9079999999999995 | |
- type: mrr_at_10 | |
value: 14.972 | |
- type: mrr_at_100 | |
value: 16.235 | |
- type: mrr_at_1000 | |
value: 16.337 | |
- type: mrr_at_3 | |
value: 12.323 | |
- type: mrr_at_5 | |
value: 13.751 | |
- type: ndcg_at_1 | |
value: 7.9079999999999995 | |
- type: ndcg_at_10 | |
value: 19.131 | |
- type: ndcg_at_100 | |
value: 25.868000000000002 | |
- type: ndcg_at_1000 | |
value: 28.823999999999998 | |
- type: ndcg_at_3 | |
value: 13.611 | |
- type: ndcg_at_5 | |
value: 16.178 | |
- type: precision_at_1 | |
value: 7.9079999999999995 | |
- type: precision_at_10 | |
value: 3.4259999999999997 | |
- type: precision_at_100 | |
value: 0.687 | |
- type: precision_at_1000 | |
value: 0.094 | |
- type: precision_at_3 | |
value: 6.103 | |
- type: precision_at_5 | |
value: 4.951 | |
- type: recall_at_1 | |
value: 7.7170000000000005 | |
- type: recall_at_10 | |
value: 33.147999999999996 | |
- type: recall_at_100 | |
value: 65.55199999999999 | |
- type: recall_at_1000 | |
value: 88.748 | |
- type: recall_at_3 | |
value: 17.863 | |
- type: recall_at_5 | |
value: 24.083 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_domain | |
name: MTEB MTOPDomainClassification (en) | |
config: en | |
split: test | |
revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf | |
metrics: | |
- type: accuracy | |
value: 95.48335613315093 | |
- type: f1 | |
value: 95.18813547597892 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_intent | |
name: MTEB MTOPIntentClassification (en) | |
config: en | |
split: test | |
revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba | |
metrics: | |
- type: accuracy | |
value: 82.83857729138167 | |
- type: f1 | |
value: 63.61922697275075 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_intent | |
name: MTEB MassiveIntentClassification (en) | |
config: en | |
split: test | |
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 | |
metrics: | |
- type: accuracy | |
value: 76.65433759246805 | |
- type: f1 | |
value: 73.24385243140212 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_scenario | |
name: MTEB MassiveScenarioClassification (en) | |
config: en | |
split: test | |
revision: 7d571f92784cd94a019292a1f45445077d0ef634 | |
metrics: | |
- type: accuracy | |
value: 79.98655010087425 | |
- type: f1 | |
value: 79.3880305174127 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-p2p | |
name: MTEB MedrxivClusteringP2P | |
config: default | |
split: test | |
revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 | |
metrics: | |
- type: v_measure | |
value: 30.109152457220606 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-s2s | |
name: MTEB MedrxivClusteringS2S | |
config: default | |
split: test | |
revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 | |
metrics: | |
- type: v_measure | |
value: 26.928355856501696 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/mind_small | |
name: MTEB MindSmallReranking | |
config: default | |
split: test | |
revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 | |
metrics: | |
- type: map | |
value: 29.73337424086118 | |
- type: mrr | |
value: 30.753319352871074 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nfcorpus | |
name: MTEB NFCorpus | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 4.303 | |
- type: map_at_10 | |
value: 9.653 | |
- type: map_at_100 | |
value: 11.952 | |
- type: map_at_1000 | |
value: 13.126999999999999 | |
- type: map_at_3 | |
value: 6.976 | |
- type: map_at_5 | |
value: 8.292 | |
- type: mrr_at_1 | |
value: 35.913000000000004 | |
- type: mrr_at_10 | |
value: 45.827 | |
- type: mrr_at_100 | |
value: 46.587 | |
- type: mrr_at_1000 | |
value: 46.635 | |
- type: mrr_at_3 | |
value: 43.344 | |
- type: mrr_at_5 | |
value: 44.876 | |
- type: ndcg_at_1 | |
value: 34.056 | |
- type: ndcg_at_10 | |
value: 27.161 | |
- type: ndcg_at_100 | |
value: 25.552999999999997 | |
- type: ndcg_at_1000 | |
value: 34.671 | |
- type: ndcg_at_3 | |
value: 31.267 | |
- type: ndcg_at_5 | |
value: 29.896 | |
- type: precision_at_1 | |
value: 35.604 | |
- type: precision_at_10 | |
value: 19.969 | |
- type: precision_at_100 | |
value: 6.115 | |
- type: precision_at_1000 | |
value: 1.892 | |
- type: precision_at_3 | |
value: 29.825000000000003 | |
- type: precision_at_5 | |
value: 26.253999999999998 | |
- type: recall_at_1 | |
value: 4.303 | |
- type: recall_at_10 | |
value: 14.033999999999999 | |
- type: recall_at_100 | |
value: 28.250999999999998 | |
- type: recall_at_1000 | |
value: 58.751 | |
- type: recall_at_3 | |
value: 8.257 | |
- type: recall_at_5 | |
value: 10.761999999999999 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nq | |
name: MTEB NQ | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 14.668000000000001 | |
- type: map_at_10 | |
value: 26.593 | |
- type: map_at_100 | |
value: 28.094 | |
- type: map_at_1000 | |
value: 28.155 | |
- type: map_at_3 | |
value: 22.054000000000002 | |
- type: map_at_5 | |
value: 24.583 | |
- type: mrr_at_1 | |
value: 17.063 | |
- type: mrr_at_10 | |
value: 29.061999999999998 | |
- type: mrr_at_100 | |
value: 30.281000000000002 | |
- type: mrr_at_1000 | |
value: 30.325000000000003 | |
- type: mrr_at_3 | |
value: 24.754 | |
- type: mrr_at_5 | |
value: 27.281 | |
- type: ndcg_at_1 | |
value: 17.034 | |
- type: ndcg_at_10 | |
value: 34.157 | |
- type: ndcg_at_100 | |
value: 40.988 | |
- type: ndcg_at_1000 | |
value: 42.382999999999996 | |
- type: ndcg_at_3 | |
value: 25.076999999999998 | |
- type: ndcg_at_5 | |
value: 29.572 | |
- type: precision_at_1 | |
value: 17.034 | |
- type: precision_at_10 | |
value: 6.561 | |
- type: precision_at_100 | |
value: 1.04 | |
- type: precision_at_1000 | |
value: 0.117 | |
- type: precision_at_3 | |
value: 12.167 | |
- type: precision_at_5 | |
value: 9.809 | |
- type: recall_at_1 | |
value: 14.668000000000001 | |
- type: recall_at_10 | |
value: 55.291999999999994 | |
- type: recall_at_100 | |
value: 85.82 | |
- type: recall_at_1000 | |
value: 96.164 | |
- type: recall_at_3 | |
value: 31.208999999999996 | |
- type: recall_at_5 | |
value: 41.766 | |
- task: | |
type: Retrieval | |
dataset: | |
type: quora | |
name: MTEB QuoraRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 66.20899999999999 | |
- type: map_at_10 | |
value: 80.024 | |
- type: map_at_100 | |
value: 80.73 | |
- type: map_at_1000 | |
value: 80.753 | |
- type: map_at_3 | |
value: 76.82900000000001 | |
- type: map_at_5 | |
value: 78.866 | |
- type: mrr_at_1 | |
value: 76.25 | |
- type: mrr_at_10 | |
value: 83.382 | |
- type: mrr_at_100 | |
value: 83.535 | |
- type: mrr_at_1000 | |
value: 83.538 | |
- type: mrr_at_3 | |
value: 82.013 | |
- type: mrr_at_5 | |
value: 82.931 | |
- type: ndcg_at_1 | |
value: 76.25999999999999 | |
- type: ndcg_at_10 | |
value: 84.397 | |
- type: ndcg_at_100 | |
value: 85.988 | |
- type: ndcg_at_1000 | |
value: 86.18299999999999 | |
- type: ndcg_at_3 | |
value: 80.778 | |
- type: ndcg_at_5 | |
value: 82.801 | |
- type: precision_at_1 | |
value: 76.25999999999999 | |
- type: precision_at_10 | |
value: 12.952 | |
- type: precision_at_100 | |
value: 1.509 | |
- type: precision_at_1000 | |
value: 0.156 | |
- type: precision_at_3 | |
value: 35.323 | |
- type: precision_at_5 | |
value: 23.524 | |
- type: recall_at_1 | |
value: 66.20899999999999 | |
- type: recall_at_10 | |
value: 93.108 | |
- type: recall_at_100 | |
value: 98.817 | |
- type: recall_at_1000 | |
value: 99.857 | |
- type: recall_at_3 | |
value: 83.031 | |
- type: recall_at_5 | |
value: 88.441 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering | |
name: MTEB RedditClustering | |
config: default | |
split: test | |
revision: 24640382cdbf8abc73003fb0fa6d111a705499eb | |
metrics: | |
- type: v_measure | |
value: 41.82535503883439 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering-p2p | |
name: MTEB RedditClusteringP2P | |
config: default | |
split: test | |
revision: 282350215ef01743dc01b456c7f5241fa8937f16 | |
metrics: | |
- type: v_measure | |
value: 62.077510084458055 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scidocs | |
name: MTEB SCIDOCS | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 3.383 | |
- type: map_at_10 | |
value: 8.839 | |
- type: map_at_100 | |
value: 10.876 | |
- type: map_at_1000 | |
value: 11.201 | |
- type: map_at_3 | |
value: 6.361 | |
- type: map_at_5 | |
value: 7.536 | |
- type: mrr_at_1 | |
value: 16.6 | |
- type: mrr_at_10 | |
value: 26.003999999999998 | |
- type: mrr_at_100 | |
value: 27.271 | |
- type: mrr_at_1000 | |
value: 27.354 | |
- type: mrr_at_3 | |
value: 22.900000000000002 | |
- type: mrr_at_5 | |
value: 24.58 | |
- type: ndcg_at_1 | |
value: 16.6 | |
- type: ndcg_at_10 | |
value: 15.345 | |
- type: ndcg_at_100 | |
value: 23.659 | |
- type: ndcg_at_1000 | |
value: 29.537000000000003 | |
- type: ndcg_at_3 | |
value: 14.283999999999999 | |
- type: ndcg_at_5 | |
value: 12.509999999999998 | |
- type: precision_at_1 | |
value: 16.6 | |
- type: precision_at_10 | |
value: 8.17 | |
- type: precision_at_100 | |
value: 2.028 | |
- type: precision_at_1000 | |
value: 0.34299999999999997 | |
- type: precision_at_3 | |
value: 13.633000000000001 | |
- type: precision_at_5 | |
value: 11.16 | |
- type: recall_at_1 | |
value: 3.383 | |
- type: recall_at_10 | |
value: 16.557 | |
- type: recall_at_100 | |
value: 41.123 | |
- type: recall_at_1000 | |
value: 69.67999999999999 | |
- type: recall_at_3 | |
value: 8.298 | |
- type: recall_at_5 | |
value: 11.322000000000001 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sickr-sts | |
name: MTEB SICK-R | |
config: default | |
split: test | |
revision: a6ea5a8cab320b040a23452cc28066d9beae2cee | |
metrics: | |
- type: cos_sim_spearman | |
value: 75.55405115197729 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts12-sts | |
name: MTEB STS12 | |
config: default | |
split: test | |
revision: a0d554a64d88156834ff5ae9920b964011b16384 | |
metrics: | |
- type: cos_sim_spearman | |
value: 67.65074099726466 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts13-sts | |
name: MTEB STS13 | |
config: default | |
split: test | |
revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca | |
metrics: | |
- type: cos_sim_spearman | |
value: 83.89765011154986 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts14-sts | |
name: MTEB STS14 | |
config: default | |
split: test | |
revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 | |
metrics: | |
- type: cos_sim_spearman | |
value: 76.97256789216159 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts15-sts | |
name: MTEB STS15 | |
config: default | |
split: test | |
revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 | |
metrics: | |
- type: cos_sim_spearman | |
value: 83.80216382863031 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts16-sts | |
name: MTEB STS16 | |
config: default | |
split: test | |
revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 | |
metrics: | |
- type: cos_sim_spearman | |
value: 81.90574806413879 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts17-crosslingual-sts | |
name: MTEB STS17 (en-en) | |
config: en-en | |
split: test | |
revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d | |
metrics: | |
- type: cos_sim_spearman | |
value: 85.58485422591949 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts22-crosslingual-sts | |
name: MTEB STS22 (en) | |
config: en | |
split: test | |
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 | |
metrics: | |
- type: cos_sim_spearman | |
value: 65.92967262944444 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/stsbenchmark-sts | |
name: MTEB STSBenchmark | |
config: default | |
split: test | |
revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 | |
metrics: | |
- type: cos_sim_spearman | |
value: 80.41509666334721 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/scidocs-reranking | |
name: MTEB SciDocsRR | |
config: default | |
split: test | |
revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab | |
metrics: | |
- type: map | |
value: 77.81287769479543 | |
- type: mrr | |
value: 94.13409665860645 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scifact | |
name: MTEB SciFact | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 52.093999999999994 | |
- type: map_at_10 | |
value: 63.641999999999996 | |
- type: map_at_100 | |
value: 64.402 | |
- type: map_at_1000 | |
value: 64.416 | |
- type: map_at_3 | |
value: 60.878 | |
- type: map_at_5 | |
value: 62.778 | |
- type: mrr_at_1 | |
value: 55.333 | |
- type: mrr_at_10 | |
value: 65.139 | |
- type: mrr_at_100 | |
value: 65.75999999999999 | |
- type: mrr_at_1000 | |
value: 65.77199999999999 | |
- type: mrr_at_3 | |
value: 62.944 | |
- type: mrr_at_5 | |
value: 64.511 | |
- type: ndcg_at_1 | |
value: 55.333 | |
- type: ndcg_at_10 | |
value: 68.675 | |
- type: ndcg_at_100 | |
value: 71.794 | |
- type: ndcg_at_1000 | |
value: 72.18299999999999 | |
- type: ndcg_at_3 | |
value: 63.977 | |
- type: ndcg_at_5 | |
value: 66.866 | |
- type: precision_at_1 | |
value: 55.333 | |
- type: precision_at_10 | |
value: 9.232999999999999 | |
- type: precision_at_100 | |
value: 1.087 | |
- type: precision_at_1000 | |
value: 0.11199999999999999 | |
- type: precision_at_3 | |
value: 25.667 | |
- type: precision_at_5 | |
value: 17.0 | |
- type: recall_at_1 | |
value: 52.093999999999994 | |
- type: recall_at_10 | |
value: 82.506 | |
- type: recall_at_100 | |
value: 95.933 | |
- type: recall_at_1000 | |
value: 99.0 | |
- type: recall_at_3 | |
value: 70.078 | |
- type: recall_at_5 | |
value: 77.35600000000001 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/sprintduplicatequestions-pairclassification | |
name: MTEB SprintDuplicateQuestions | |
config: default | |
split: test | |
revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 99.7128712871287 | |
- type: cos_sim_ap | |
value: 91.30057039245253 | |
- type: cos_sim_f1 | |
value: 85.35480624056368 | |
- type: cos_sim_precision | |
value: 85.91691995947315 | |
- type: cos_sim_recall | |
value: 84.8 | |
- type: dot_accuracy | |
value: 99.35346534653465 | |
- type: dot_ap | |
value: 67.929309733355 | |
- type: dot_f1 | |
value: 63.94205897568547 | |
- type: dot_precision | |
value: 66.2379421221865 | |
- type: dot_recall | |
value: 61.8 | |
- type: euclidean_accuracy | |
value: 99.69009900990099 | |
- type: euclidean_ap | |
value: 89.62179420600057 | |
- type: euclidean_f1 | |
value: 83.93039918116682 | |
- type: euclidean_precision | |
value: 85.9538784067086 | |
- type: euclidean_recall | |
value: 82.0 | |
- type: manhattan_accuracy | |
value: 99.70990099009902 | |
- type: manhattan_ap | |
value: 90.29611631593602 | |
- type: manhattan_f1 | |
value: 84.81729284611424 | |
- type: manhattan_precision | |
value: 87.38069989395547 | |
- type: manhattan_recall | |
value: 82.39999999999999 | |
- type: max_accuracy | |
value: 99.7128712871287 | |
- type: max_ap | |
value: 91.30057039245253 | |
- type: max_f1 | |
value: 85.35480624056368 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering | |
name: MTEB StackExchangeClustering | |
config: default | |
split: test | |
revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 | |
metrics: | |
- type: v_measure | |
value: 67.33611278831218 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering-p2p | |
name: MTEB StackExchangeClusteringP2P | |
config: default | |
split: test | |
revision: 815ca46b2622cec33ccafc3735d572c266efdb44 | |
metrics: | |
- type: v_measure | |
value: 34.504437768624214 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/stackoverflowdupquestions-reranking | |
name: MTEB StackOverflowDupQuestions | |
config: default | |
split: test | |
revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 | |
metrics: | |
- type: map | |
value: 49.80014786474266 | |
- type: mrr | |
value: 50.468909154570916 | |
- task: | |
type: Summarization | |
dataset: | |
type: mteb/summeval | |
name: MTEB SummEval | |
config: default | |
split: test | |
revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c | |
metrics: | |
- type: cos_sim_pearson | |
value: 30.677648147466808 | |
- type: cos_sim_spearman | |
value: 30.191761045901888 | |
- type: dot_pearson | |
value: 23.16759191245942 | |
- type: dot_spearman | |
value: 23.186942570638486 | |
- task: | |
type: Retrieval | |
dataset: | |
type: trec-covid | |
name: MTEB TRECCOVID | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 0.214 | |
- type: map_at_10 | |
value: 1.2309999999999999 | |
- type: map_at_100 | |
value: 5.867 | |
- type: map_at_1000 | |
value: 14.671999999999999 | |
- type: map_at_3 | |
value: 0.519 | |
- type: map_at_5 | |
value: 0.764 | |
- type: mrr_at_1 | |
value: 82.0 | |
- type: mrr_at_10 | |
value: 87.519 | |
- type: mrr_at_100 | |
value: 87.519 | |
- type: mrr_at_1000 | |
value: 87.536 | |
- type: mrr_at_3 | |
value: 86.333 | |
- type: mrr_at_5 | |
value: 87.233 | |
- type: ndcg_at_1 | |
value: 77.0 | |
- type: ndcg_at_10 | |
value: 55.665 | |
- type: ndcg_at_100 | |
value: 39.410000000000004 | |
- type: ndcg_at_1000 | |
value: 37.21 | |
- type: ndcg_at_3 | |
value: 65.263 | |
- type: ndcg_at_5 | |
value: 61.424 | |
- type: precision_at_1 | |
value: 82.0 | |
- type: precision_at_10 | |
value: 55.400000000000006 | |
- type: precision_at_100 | |
value: 39.04 | |
- type: precision_at_1000 | |
value: 16.788 | |
- type: precision_at_3 | |
value: 67.333 | |
- type: precision_at_5 | |
value: 62.8 | |
- type: recall_at_1 | |
value: 0.214 | |
- type: recall_at_10 | |
value: 1.4200000000000002 | |
- type: recall_at_100 | |
value: 9.231 | |
- type: recall_at_1000 | |
value: 35.136 | |
- type: recall_at_3 | |
value: 0.544 | |
- type: recall_at_5 | |
value: 0.832 | |
- task: | |
type: Retrieval | |
dataset: | |
type: webis-touche2020 | |
name: MTEB Touche2020 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 0.41000000000000003 | |
- type: map_at_10 | |
value: 2.32 | |
- type: map_at_100 | |
value: 4.077 | |
- type: map_at_1000 | |
value: 4.9430000000000005 | |
- type: map_at_3 | |
value: 1.087 | |
- type: map_at_5 | |
value: 1.466 | |
- type: mrr_at_1 | |
value: 6.122 | |
- type: mrr_at_10 | |
value: 13.999 | |
- type: mrr_at_100 | |
value: 16.524 | |
- type: mrr_at_1000 | |
value: 16.567999999999998 | |
- type: mrr_at_3 | |
value: 11.224 | |
- type: mrr_at_5 | |
value: 13.163 | |
- type: ndcg_at_1 | |
value: 5.102 | |
- type: ndcg_at_10 | |
value: 6.542000000000001 | |
- type: ndcg_at_100 | |
value: 14.127 | |
- type: ndcg_at_1000 | |
value: 24.396 | |
- type: ndcg_at_3 | |
value: 5.653 | |
- type: ndcg_at_5 | |
value: 5.5649999999999995 | |
- type: precision_at_1 | |
value: 6.122 | |
- type: precision_at_10 | |
value: 7.142999999999999 | |
- type: precision_at_100 | |
value: 3.51 | |
- type: precision_at_1000 | |
value: 0.9860000000000001 | |
- type: precision_at_3 | |
value: 6.802999999999999 | |
- type: precision_at_5 | |
value: 6.938999999999999 | |
- type: recall_at_1 | |
value: 0.41000000000000003 | |
- type: recall_at_10 | |
value: 5.627 | |
- type: recall_at_100 | |
value: 23.121 | |
- type: recall_at_1000 | |
value: 54.626 | |
- type: recall_at_3 | |
value: 1.763 | |
- type: recall_at_5 | |
value: 3.013 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/toxic_conversations_50k | |
name: MTEB ToxicConversationsClassification | |
config: default | |
split: test | |
revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c | |
metrics: | |
- type: accuracy | |
value: 70.71119999999999 | |
- type: ap | |
value: 15.1342268718371 | |
- type: f1 | |
value: 55.043262693594855 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/tweet_sentiment_extraction | |
name: MTEB TweetSentimentExtractionClassification | |
config: default | |
split: test | |
revision: d604517c81ca91fe16a244d1248fc021f9ecee7a | |
metrics: | |
- type: accuracy | |
value: 60.89983022071308 | |
- type: f1 | |
value: 61.13086468149106 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/twentynewsgroups-clustering | |
name: MTEB TwentyNewsgroupsClustering | |
config: default | |
split: test | |
revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 | |
metrics: | |
- type: v_measure | |
value: 30.264802332456515 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twittersemeval2015-pairclassification | |
name: MTEB TwitterSemEval2015 | |
config: default | |
split: test | |
revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 84.46086904690947 | |
- type: cos_sim_ap | |
value: 68.76039123104324 | |
- type: cos_sim_f1 | |
value: 63.002224839680665 | |
- type: cos_sim_precision | |
value: 62.503245910153204 | |
- type: cos_sim_recall | |
value: 63.50923482849604 | |
- type: dot_accuracy | |
value: 80.07391071109257 | |
- type: dot_ap | |
value: 53.43322643579626 | |
- type: dot_f1 | |
value: 52.6850065983149 | |
- type: dot_precision | |
value: 42.81471704339218 | |
- type: dot_recall | |
value: 68.46965699208444 | |
- type: euclidean_accuracy | |
value: 84.2701317279609 | |
- type: euclidean_ap | |
value: 67.55078414631596 | |
- type: euclidean_f1 | |
value: 62.90723537877797 | |
- type: euclidean_precision | |
value: 62.392940565792884 | |
- type: euclidean_recall | |
value: 63.43007915567283 | |
- type: manhattan_accuracy | |
value: 84.22244739822375 | |
- type: manhattan_ap | |
value: 67.92488847948273 | |
- type: manhattan_f1 | |
value: 62.99132210311383 | |
- type: manhattan_precision | |
value: 60.99851705388038 | |
- type: manhattan_recall | |
value: 65.11873350923483 | |
- type: max_accuracy | |
value: 84.46086904690947 | |
- type: max_ap | |
value: 68.76039123104324 | |
- type: max_f1 | |
value: 63.002224839680665 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twitterurlcorpus-pairclassification | |
name: MTEB TwitterURLCorpus | |
config: default | |
split: test | |
revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf | |
metrics: | |
- type: cos_sim_accuracy | |
value: 87.71296619707377 | |
- type: cos_sim_ap | |
value: 82.76174215711472 | |
- type: cos_sim_f1 | |
value: 75.73585592141168 | |
- type: cos_sim_precision | |
value: 71.79416430985721 | |
- type: cos_sim_recall | |
value: 80.1355097012627 | |
- type: dot_accuracy | |
value: 85.62502425583111 | |
- type: dot_ap | |
value: 77.50549495030725 | |
- type: dot_f1 | |
value: 71.47900863425035 | |
- type: dot_precision | |
value: 65.4587361546834 | |
- type: dot_recall | |
value: 78.71881736987989 | |
- type: euclidean_accuracy | |
value: 87.12694531765437 | |
- type: euclidean_ap | |
value: 81.63583409712018 | |
- type: euclidean_f1 | |
value: 74.50966015324268 | |
- type: euclidean_precision | |
value: 71.11764294212331 | |
- type: euclidean_recall | |
value: 78.24145364952264 | |
- type: manhattan_accuracy | |
value: 87.35009896379088 | |
- type: manhattan_ap | |
value: 82.20417545366242 | |
- type: manhattan_f1 | |
value: 74.84157622550805 | |
- type: manhattan_precision | |
value: 71.00898410504493 | |
- type: manhattan_recall | |
value: 79.11148752694795 | |
- type: max_accuracy | |
value: 87.71296619707377 | |
- type: max_ap | |
value: 82.76174215711472 | |
- type: max_f1 | |
value: 75.73585592141168 | |
# LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders | |
> LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance. | |
- **Repository:** https://github.com/McGill-NLP/llm2vec | |
- **Paper:** https://arxiv.org/abs/2404.05961 | |
## Installation | |
```bash | |
pip install llm2vec | |
``` | |
## Usage | |
```python | |
from llm2vec import LLM2Vec | |
import torch | |
from transformers import AutoTokenizer, AutoModel, AutoConfig | |
from peft import PeftModel | |
# Loading base Mistral model, along with custom code that enables bidirectional connections in decoder-only LLMs. MNTP LoRA weights are merged into the base model. | |
tokenizer = AutoTokenizer.from_pretrained( | |
"McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp" | |
) | |
config = AutoConfig.from_pretrained( | |
"McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp", trust_remote_code=True | |
) | |
model = AutoModel.from_pretrained( | |
"McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp", | |
trust_remote_code=True, | |
config=config, | |
torch_dtype=torch.bfloat16, | |
device_map="cuda" if torch.cuda.is_available() else "cpu", | |
) | |
model = PeftModel.from_pretrained( | |
model, | |
"McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp", | |
) | |
model = model.merge_and_unload() # This can take several minutes on cpu | |
# Loading unsupervised SimCSE model. This loads the trained LoRA weights on top of MNTP model. Hence the final weights are -- Base model + MNTP (LoRA) + SimCSE (LoRA). | |
model = PeftModel.from_pretrained( | |
model, "McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp-unsup-simcse" | |
) | |
# Wrapper for encoding and pooling operations | |
l2v = LLM2Vec(model, tokenizer, pooling_mode="mean", max_length=512) | |
# Encoding queries using instructions | |
instruction = ( | |
"Given a web search query, retrieve relevant passages that answer the query:" | |
) | |
queries = [ | |
[instruction, "how much protein should a female eat"], | |
[instruction, "summit define"], | |
] | |
q_reps = l2v.encode(queries) | |
# Encoding documents. Instruction are not required for documents | |
documents = [ | |
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.", | |
"Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments.", | |
] | |
d_reps = l2v.encode(documents) | |
# Compute cosine similarity | |
q_reps_norm = torch.nn.functional.normalize(q_reps, p=2, dim=1) | |
d_reps_norm = torch.nn.functional.normalize(d_reps, p=2, dim=1) | |
cos_sim = torch.mm(q_reps_norm, d_reps_norm.transpose(0, 1)) | |
print(cos_sim) | |
""" | |
tensor([[0.6175, 0.2535], | |
[0.2298, 0.5792]]) | |
""" | |
``` | |
## Questions | |
If you have any question about the code, feel free to email Parishad (`parishad.behnamghader@mila.quebec`) and Vaibhav (`vaibhav.adlakha@mila.quebec`). |