Sentence Similarity
PEFT
Safetensors
English
text-embedding
embeddings
information-retrieval
beir
text-classification
language-model
text-clustering
text-semantic-similarity
text-evaluation
text-reranking
feature-extraction
Sentence Similarity
natural_questions
ms_marco
fever
hotpot_qa
mteb
Eval Results
library_name: peft | |
license: mit | |
language: | |
- en | |
pipeline_tag: sentence-similarity | |
tags: | |
- text-embedding | |
- embeddings | |
- information-retrieval | |
- beir | |
- text-classification | |
- language-model | |
- text-clustering | |
- text-semantic-similarity | |
- text-evaluation | |
- text-reranking | |
- feature-extraction | |
- sentence-similarity | |
- Sentence Similarity | |
- natural_questions | |
- ms_marco | |
- fever | |
- hotpot_qa | |
- mteb | |
model-index: | |
- name: LLM2Vec-Sheared-LLaMA-unsupervised | |
results: | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_counterfactual | |
name: MTEB AmazonCounterfactualClassification (en) | |
config: en | |
split: test | |
revision: e8379541af4e31359cca9fbcf4b00f2671dba205 | |
metrics: | |
- type: accuracy | |
value: 72.92537313432835 | |
- type: ap | |
value: 36.6875749512053 | |
- type: f1 | |
value: 67.36274146169845 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_polarity | |
name: MTEB AmazonPolarityClassification | |
config: default | |
split: test | |
revision: e2d317d38cd51312af73b3d32a06d1a08b442046 | |
metrics: | |
- type: accuracy | |
value: 74.282675 | |
- type: ap | |
value: 69.15441866642587 | |
- type: f1 | |
value: 74.13028166370813 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_reviews_multi | |
name: MTEB AmazonReviewsClassification (en) | |
config: en | |
split: test | |
revision: 1399c76144fd37290681b995c656ef9b2e06e26d | |
metrics: | |
- type: accuracy | |
value: 36.136 | |
- type: f1 | |
value: 35.840498320506235 | |
- task: | |
type: Retrieval | |
dataset: | |
type: arguana | |
name: MTEB ArguAna | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 21.407999999999998 | |
- type: map_at_10 | |
value: 35.474 | |
- type: map_at_100 | |
value: 36.653999999999996 | |
- type: map_at_1000 | |
value: 36.68 | |
- type: map_at_3 | |
value: 30.974 | |
- type: map_at_5 | |
value: 33.265 | |
- type: mrr_at_1 | |
value: 22.119 | |
- type: mrr_at_10 | |
value: 35.714 | |
- type: mrr_at_100 | |
value: 36.895 | |
- type: mrr_at_1000 | |
value: 36.921 | |
- type: mrr_at_3 | |
value: 31.2 | |
- type: mrr_at_5 | |
value: 33.518 | |
- type: ndcg_at_1 | |
value: 21.407999999999998 | |
- type: ndcg_at_10 | |
value: 43.644 | |
- type: ndcg_at_100 | |
value: 49.035000000000004 | |
- type: ndcg_at_1000 | |
value: 49.685 | |
- type: ndcg_at_3 | |
value: 34.174 | |
- type: ndcg_at_5 | |
value: 38.288 | |
- type: precision_at_1 | |
value: 21.407999999999998 | |
- type: precision_at_10 | |
value: 6.999 | |
- type: precision_at_100 | |
value: 0.9440000000000001 | |
- type: precision_at_1000 | |
value: 0.099 | |
- type: precision_at_3 | |
value: 14.485999999999999 | |
- type: precision_at_5 | |
value: 10.683 | |
- type: recall_at_1 | |
value: 21.407999999999998 | |
- type: recall_at_10 | |
value: 69.986 | |
- type: recall_at_100 | |
value: 94.381 | |
- type: recall_at_1000 | |
value: 99.431 | |
- type: recall_at_3 | |
value: 43.457 | |
- type: recall_at_5 | |
value: 53.413999999999994 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-p2p | |
name: MTEB ArxivClusteringP2P | |
config: default | |
split: test | |
revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d | |
metrics: | |
- type: v_measure | |
value: 42.915010245699904 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-s2s | |
name: MTEB ArxivClusteringS2S | |
config: default | |
split: test | |
revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 | |
metrics: | |
- type: v_measure | |
value: 35.19568272188972 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/askubuntudupquestions-reranking | |
name: MTEB AskUbuntuDupQuestions | |
config: default | |
split: test | |
revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 | |
metrics: | |
- type: map | |
value: 52.696972763822615 | |
- type: mrr | |
value: 65.87136701402629 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/biosses-sts | |
name: MTEB BIOSSES | |
config: default | |
split: test | |
revision: d3fb88f8f02e40887cd149695127462bbcf29b4a | |
metrics: | |
- type: cos_sim_spearman | |
value: 75.12038636775851 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/banking77 | |
name: MTEB Banking77Classification | |
config: default | |
split: test | |
revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 | |
metrics: | |
- type: accuracy | |
value: 78.99675324675324 | |
- type: f1 | |
value: 78.90527329824852 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-p2p | |
name: MTEB BiorxivClusteringP2P | |
config: default | |
split: test | |
revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 | |
metrics: | |
- type: v_measure | |
value: 35.02170435970243 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-s2s | |
name: MTEB BiorxivClusteringS2S | |
config: default | |
split: test | |
revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 | |
metrics: | |
- type: v_measure | |
value: 27.208216971540782 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/android | |
name: MTEB CQADupstackAndroidRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 16.432 | |
- type: map_at_10 | |
value: 23.769000000000002 | |
- type: map_at_100 | |
value: 25.038 | |
- type: map_at_1000 | |
value: 25.208000000000002 | |
- type: map_at_3 | |
value: 21.532999999999998 | |
- type: map_at_5 | |
value: 22.668 | |
- type: mrr_at_1 | |
value: 21.316 | |
- type: mrr_at_10 | |
value: 28.89 | |
- type: mrr_at_100 | |
value: 29.799999999999997 | |
- type: mrr_at_1000 | |
value: 29.887999999999998 | |
- type: mrr_at_3 | |
value: 26.705000000000002 | |
- type: mrr_at_5 | |
value: 27.864 | |
- type: ndcg_at_1 | |
value: 21.316 | |
- type: ndcg_at_10 | |
value: 28.656 | |
- type: ndcg_at_100 | |
value: 34.405 | |
- type: ndcg_at_1000 | |
value: 37.771 | |
- type: ndcg_at_3 | |
value: 24.98 | |
- type: ndcg_at_5 | |
value: 26.384999999999998 | |
- type: precision_at_1 | |
value: 21.316 | |
- type: precision_at_10 | |
value: 5.8229999999999995 | |
- type: precision_at_100 | |
value: 1.157 | |
- type: precision_at_1000 | |
value: 0.181 | |
- type: precision_at_3 | |
value: 12.446 | |
- type: precision_at_5 | |
value: 8.984 | |
- type: recall_at_1 | |
value: 16.432 | |
- type: recall_at_10 | |
value: 37.696000000000005 | |
- type: recall_at_100 | |
value: 63.198 | |
- type: recall_at_1000 | |
value: 86.651 | |
- type: recall_at_3 | |
value: 26.651000000000003 | |
- type: recall_at_5 | |
value: 30.901 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/english | |
name: MTEB CQADupstackEnglishRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 16.106 | |
- type: map_at_10 | |
value: 21.770999999999997 | |
- type: map_at_100 | |
value: 22.538 | |
- type: map_at_1000 | |
value: 22.656000000000002 | |
- type: map_at_3 | |
value: 19.918 | |
- type: map_at_5 | |
value: 20.957 | |
- type: mrr_at_1 | |
value: 21.083 | |
- type: mrr_at_10 | |
value: 26.502 | |
- type: mrr_at_100 | |
value: 27.161 | |
- type: mrr_at_1000 | |
value: 27.234 | |
- type: mrr_at_3 | |
value: 24.735 | |
- type: mrr_at_5 | |
value: 25.753999999999998 | |
- type: ndcg_at_1 | |
value: 21.083 | |
- type: ndcg_at_10 | |
value: 25.625999999999998 | |
- type: ndcg_at_100 | |
value: 29.152 | |
- type: ndcg_at_1000 | |
value: 32.025 | |
- type: ndcg_at_3 | |
value: 22.721 | |
- type: ndcg_at_5 | |
value: 24.029 | |
- type: precision_at_1 | |
value: 21.083 | |
- type: precision_at_10 | |
value: 4.8919999999999995 | |
- type: precision_at_100 | |
value: 0.844 | |
- type: precision_at_1000 | |
value: 0.13699999999999998 | |
- type: precision_at_3 | |
value: 11.104 | |
- type: precision_at_5 | |
value: 7.987 | |
- type: recall_at_1 | |
value: 16.106 | |
- type: recall_at_10 | |
value: 32.385999999999996 | |
- type: recall_at_100 | |
value: 47.961999999999996 | |
- type: recall_at_1000 | |
value: 67.63900000000001 | |
- type: recall_at_3 | |
value: 23.568 | |
- type: recall_at_5 | |
value: 27.326 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gaming | |
name: MTEB CQADupstackGamingRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 22.517 | |
- type: map_at_10 | |
value: 29.593999999999998 | |
- type: map_at_100 | |
value: 30.695 | |
- type: map_at_1000 | |
value: 30.803000000000004 | |
- type: map_at_3 | |
value: 27.592 | |
- type: map_at_5 | |
value: 28.768 | |
- type: mrr_at_1 | |
value: 26.27 | |
- type: mrr_at_10 | |
value: 33.076 | |
- type: mrr_at_100 | |
value: 33.998 | |
- type: mrr_at_1000 | |
value: 34.073 | |
- type: mrr_at_3 | |
value: 31.223 | |
- type: mrr_at_5 | |
value: 32.257000000000005 | |
- type: ndcg_at_1 | |
value: 26.27 | |
- type: ndcg_at_10 | |
value: 33.726 | |
- type: ndcg_at_100 | |
value: 39.079 | |
- type: ndcg_at_1000 | |
value: 41.762 | |
- type: ndcg_at_3 | |
value: 30.064 | |
- type: ndcg_at_5 | |
value: 31.858999999999998 | |
- type: precision_at_1 | |
value: 26.27 | |
- type: precision_at_10 | |
value: 5.448 | |
- type: precision_at_100 | |
value: 0.898 | |
- type: precision_at_1000 | |
value: 0.121 | |
- type: precision_at_3 | |
value: 13.417000000000002 | |
- type: precision_at_5 | |
value: 9.317 | |
- type: recall_at_1 | |
value: 22.517 | |
- type: recall_at_10 | |
value: 42.814 | |
- type: recall_at_100 | |
value: 67.037 | |
- type: recall_at_1000 | |
value: 86.89099999999999 | |
- type: recall_at_3 | |
value: 33.041 | |
- type: recall_at_5 | |
value: 37.389 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gis | |
name: MTEB CQADupstackGisRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 7.681 | |
- type: map_at_10 | |
value: 10.655000000000001 | |
- type: map_at_100 | |
value: 11.274000000000001 | |
- type: map_at_1000 | |
value: 11.381 | |
- type: map_at_3 | |
value: 9.793000000000001 | |
- type: map_at_5 | |
value: 10.202 | |
- type: mrr_at_1 | |
value: 8.248999999999999 | |
- type: mrr_at_10 | |
value: 11.453000000000001 | |
- type: mrr_at_100 | |
value: 12.074 | |
- type: mrr_at_1000 | |
value: 12.174 | |
- type: mrr_at_3 | |
value: 10.452 | |
- type: mrr_at_5 | |
value: 10.989 | |
- type: ndcg_at_1 | |
value: 8.248999999999999 | |
- type: ndcg_at_10 | |
value: 12.467 | |
- type: ndcg_at_100 | |
value: 15.942 | |
- type: ndcg_at_1000 | |
value: 19.378999999999998 | |
- type: ndcg_at_3 | |
value: 10.631 | |
- type: ndcg_at_5 | |
value: 11.411 | |
- type: precision_at_1 | |
value: 8.248999999999999 | |
- type: precision_at_10 | |
value: 1.966 | |
- type: precision_at_100 | |
value: 0.40099999999999997 | |
- type: precision_at_1000 | |
value: 0.075 | |
- type: precision_at_3 | |
value: 4.444 | |
- type: precision_at_5 | |
value: 3.186 | |
- type: recall_at_1 | |
value: 7.681 | |
- type: recall_at_10 | |
value: 17.302 | |
- type: recall_at_100 | |
value: 34.014 | |
- type: recall_at_1000 | |
value: 61.207 | |
- type: recall_at_3 | |
value: 12.389 | |
- type: recall_at_5 | |
value: 14.158999999999999 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/mathematica | |
name: MTEB CQADupstackMathematicaRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 3.868 | |
- type: map_at_10 | |
value: 6.281000000000001 | |
- type: map_at_100 | |
value: 6.903 | |
- type: map_at_1000 | |
value: 7.038 | |
- type: map_at_3 | |
value: 5.234 | |
- type: map_at_5 | |
value: 5.685 | |
- type: mrr_at_1 | |
value: 5.1 | |
- type: mrr_at_10 | |
value: 8.148 | |
- type: mrr_at_100 | |
value: 8.846 | |
- type: mrr_at_1000 | |
value: 8.963000000000001 | |
- type: mrr_at_3 | |
value: 6.944 | |
- type: mrr_at_5 | |
value: 7.498 | |
- type: ndcg_at_1 | |
value: 5.1 | |
- type: ndcg_at_10 | |
value: 8.405999999999999 | |
- type: ndcg_at_100 | |
value: 12.014 | |
- type: ndcg_at_1000 | |
value: 15.956999999999999 | |
- type: ndcg_at_3 | |
value: 6.22 | |
- type: ndcg_at_5 | |
value: 6.962 | |
- type: precision_at_1 | |
value: 5.1 | |
- type: precision_at_10 | |
value: 1.8159999999999998 | |
- type: precision_at_100 | |
value: 0.437 | |
- type: precision_at_1000 | |
value: 0.09 | |
- type: precision_at_3 | |
value: 3.1510000000000002 | |
- type: precision_at_5 | |
value: 2.463 | |
- type: recall_at_1 | |
value: 3.868 | |
- type: recall_at_10 | |
value: 13.319 | |
- type: recall_at_100 | |
value: 29.985 | |
- type: recall_at_1000 | |
value: 59.245999999999995 | |
- type: recall_at_3 | |
value: 7.0809999999999995 | |
- type: recall_at_5 | |
value: 8.914 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/physics | |
name: MTEB CQADupstackPhysicsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 13.091 | |
- type: map_at_10 | |
value: 18.701999999999998 | |
- type: map_at_100 | |
value: 19.897000000000002 | |
- type: map_at_1000 | |
value: 20.044 | |
- type: map_at_3 | |
value: 17.041999999999998 | |
- type: map_at_5 | |
value: 17.943 | |
- type: mrr_at_1 | |
value: 16.939 | |
- type: mrr_at_10 | |
value: 23.038 | |
- type: mrr_at_100 | |
value: 24.029 | |
- type: mrr_at_1000 | |
value: 24.12 | |
- type: mrr_at_3 | |
value: 21.221999999999998 | |
- type: mrr_at_5 | |
value: 22.198999999999998 | |
- type: ndcg_at_1 | |
value: 16.939 | |
- type: ndcg_at_10 | |
value: 22.566 | |
- type: ndcg_at_100 | |
value: 28.364 | |
- type: ndcg_at_1000 | |
value: 31.646 | |
- type: ndcg_at_3 | |
value: 19.646 | |
- type: ndcg_at_5 | |
value: 20.915 | |
- type: precision_at_1 | |
value: 16.939 | |
- type: precision_at_10 | |
value: 4.340999999999999 | |
- type: precision_at_100 | |
value: 0.882 | |
- type: precision_at_1000 | |
value: 0.13799999999999998 | |
- type: precision_at_3 | |
value: 9.785 | |
- type: precision_at_5 | |
value: 6.93 | |
- type: recall_at_1 | |
value: 13.091 | |
- type: recall_at_10 | |
value: 30.022 | |
- type: recall_at_100 | |
value: 55.579 | |
- type: recall_at_1000 | |
value: 78.14 | |
- type: recall_at_3 | |
value: 21.4 | |
- type: recall_at_5 | |
value: 25.020999999999997 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/programmers | |
name: MTEB CQADupstackProgrammersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 11.315999999999999 | |
- type: map_at_10 | |
value: 16.191 | |
- type: map_at_100 | |
value: 17.116 | |
- type: map_at_1000 | |
value: 17.262 | |
- type: map_at_3 | |
value: 14.302999999999999 | |
- type: map_at_5 | |
value: 15.278 | |
- type: mrr_at_1 | |
value: 14.269000000000002 | |
- type: mrr_at_10 | |
value: 19.409000000000002 | |
- type: mrr_at_100 | |
value: 20.298 | |
- type: mrr_at_1000 | |
value: 20.393 | |
- type: mrr_at_3 | |
value: 17.504 | |
- type: mrr_at_5 | |
value: 18.423000000000002 | |
- type: ndcg_at_1 | |
value: 14.269000000000002 | |
- type: ndcg_at_10 | |
value: 19.735 | |
- type: ndcg_at_100 | |
value: 24.582 | |
- type: ndcg_at_1000 | |
value: 28.337 | |
- type: ndcg_at_3 | |
value: 16.220000000000002 | |
- type: ndcg_at_5 | |
value: 17.644000000000002 | |
- type: precision_at_1 | |
value: 14.269000000000002 | |
- type: precision_at_10 | |
value: 3.721 | |
- type: precision_at_100 | |
value: 0.752 | |
- type: precision_at_1000 | |
value: 0.129 | |
- type: precision_at_3 | |
value: 7.800999999999999 | |
- type: precision_at_5 | |
value: 5.753 | |
- type: recall_at_1 | |
value: 11.315999999999999 | |
- type: recall_at_10 | |
value: 27.693 | |
- type: recall_at_100 | |
value: 49.265 | |
- type: recall_at_1000 | |
value: 76.291 | |
- type: recall_at_3 | |
value: 17.593 | |
- type: recall_at_5 | |
value: 21.368000000000002 | |
- task: | |
type: Retrieval | |
dataset: | |
type: mteb/cqadupstack | |
name: MTEB CQADupstackRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 11.131583333333332 | |
- type: map_at_10 | |
value: 15.4605 | |
- type: map_at_100 | |
value: 16.3075 | |
- type: map_at_1000 | |
value: 16.4375 | |
- type: map_at_3 | |
value: 13.995833333333332 | |
- type: map_at_5 | |
value: 14.783666666666667 | |
- type: mrr_at_1 | |
value: 13.805833333333334 | |
- type: mrr_at_10 | |
value: 18.405749999999998 | |
- type: mrr_at_100 | |
value: 19.17516666666667 | |
- type: mrr_at_1000 | |
value: 19.265833333333333 | |
- type: mrr_at_3 | |
value: 16.892416666666666 | |
- type: mrr_at_5 | |
value: 17.71058333333333 | |
- type: ndcg_at_1 | |
value: 13.805833333333334 | |
- type: ndcg_at_10 | |
value: 18.500666666666664 | |
- type: ndcg_at_100 | |
value: 22.78191666666667 | |
- type: ndcg_at_1000 | |
value: 26.095583333333334 | |
- type: ndcg_at_3 | |
value: 15.846916666666663 | |
- type: ndcg_at_5 | |
value: 17.004250000000003 | |
- type: precision_at_1 | |
value: 13.805833333333334 | |
- type: precision_at_10 | |
value: 3.4233333333333325 | |
- type: precision_at_100 | |
value: 0.6828333333333333 | |
- type: precision_at_1000 | |
value: 0.11641666666666667 | |
- type: precision_at_3 | |
value: 7.511749999999999 | |
- type: precision_at_5 | |
value: 5.440916666666666 | |
- type: recall_at_1 | |
value: 11.131583333333332 | |
- type: recall_at_10 | |
value: 24.794166666666666 | |
- type: recall_at_100 | |
value: 44.356 | |
- type: recall_at_1000 | |
value: 68.71899999999998 | |
- type: recall_at_3 | |
value: 17.145583333333335 | |
- type: recall_at_5 | |
value: 20.229083333333335 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/stats | |
name: MTEB CQADupstackStatsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 7.5520000000000005 | |
- type: map_at_10 | |
value: 10.355 | |
- type: map_at_100 | |
value: 10.875 | |
- type: map_at_1000 | |
value: 10.972999999999999 | |
- type: map_at_3 | |
value: 9.341000000000001 | |
- type: map_at_5 | |
value: 9.969 | |
- type: mrr_at_1 | |
value: 9.049 | |
- type: mrr_at_10 | |
value: 12.002 | |
- type: mrr_at_100 | |
value: 12.55 | |
- type: mrr_at_1000 | |
value: 12.635 | |
- type: mrr_at_3 | |
value: 11.12 | |
- type: mrr_at_5 | |
value: 11.626 | |
- type: ndcg_at_1 | |
value: 9.049 | |
- type: ndcg_at_10 | |
value: 12.241 | |
- type: ndcg_at_100 | |
value: 15.231 | |
- type: ndcg_at_1000 | |
value: 18.265 | |
- type: ndcg_at_3 | |
value: 10.424999999999999 | |
- type: ndcg_at_5 | |
value: 11.360000000000001 | |
- type: precision_at_1 | |
value: 9.049 | |
- type: precision_at_10 | |
value: 2.147 | |
- type: precision_at_100 | |
value: 0.411 | |
- type: precision_at_1000 | |
value: 0.073 | |
- type: precision_at_3 | |
value: 4.755 | |
- type: precision_at_5 | |
value: 3.558 | |
- type: recall_at_1 | |
value: 7.5520000000000005 | |
- type: recall_at_10 | |
value: 16.448999999999998 | |
- type: recall_at_100 | |
value: 30.505 | |
- type: recall_at_1000 | |
value: 54.435 | |
- type: recall_at_3 | |
value: 11.366 | |
- type: recall_at_5 | |
value: 13.758999999999999 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/tex | |
name: MTEB CQADupstackTexRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 5.954000000000001 | |
- type: map_at_10 | |
value: 8.229000000000001 | |
- type: map_at_100 | |
value: 8.694 | |
- type: map_at_1000 | |
value: 8.788 | |
- type: map_at_3 | |
value: 7.5 | |
- type: map_at_5 | |
value: 7.856000000000001 | |
- type: mrr_at_1 | |
value: 7.983 | |
- type: mrr_at_10 | |
value: 10.833 | |
- type: mrr_at_100 | |
value: 11.324 | |
- type: mrr_at_1000 | |
value: 11.404 | |
- type: mrr_at_3 | |
value: 9.911 | |
- type: mrr_at_5 | |
value: 10.401 | |
- type: ndcg_at_1 | |
value: 7.983 | |
- type: ndcg_at_10 | |
value: 10.126 | |
- type: ndcg_at_100 | |
value: 12.702 | |
- type: ndcg_at_1000 | |
value: 15.581999999999999 | |
- type: ndcg_at_3 | |
value: 8.779 | |
- type: ndcg_at_5 | |
value: 9.279 | |
- type: precision_at_1 | |
value: 7.983 | |
- type: precision_at_10 | |
value: 1.955 | |
- type: precision_at_100 | |
value: 0.392 | |
- type: precision_at_1000 | |
value: 0.076 | |
- type: precision_at_3 | |
value: 4.382 | |
- type: precision_at_5 | |
value: 3.09 | |
- type: recall_at_1 | |
value: 5.954000000000001 | |
- type: recall_at_10 | |
value: 13.472000000000001 | |
- type: recall_at_100 | |
value: 25.407999999999998 | |
- type: recall_at_1000 | |
value: 47.028 | |
- type: recall_at_3 | |
value: 9.367 | |
- type: recall_at_5 | |
value: 10.867 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/unix | |
name: MTEB CQADupstackUnixRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 8.894 | |
- type: map_at_10 | |
value: 12.758 | |
- type: map_at_100 | |
value: 13.639999999999999 | |
- type: map_at_1000 | |
value: 13.76 | |
- type: map_at_3 | |
value: 11.447000000000001 | |
- type: map_at_5 | |
value: 12.205 | |
- type: mrr_at_1 | |
value: 10.914 | |
- type: mrr_at_10 | |
value: 15.739 | |
- type: mrr_at_100 | |
value: 16.589000000000002 | |
- type: mrr_at_1000 | |
value: 16.679 | |
- type: mrr_at_3 | |
value: 14.179 | |
- type: mrr_at_5 | |
value: 15.162999999999998 | |
- type: ndcg_at_1 | |
value: 10.914 | |
- type: ndcg_at_10 | |
value: 15.629000000000001 | |
- type: ndcg_at_100 | |
value: 20.261000000000003 | |
- type: ndcg_at_1000 | |
value: 23.781 | |
- type: ndcg_at_3 | |
value: 13.102 | |
- type: ndcg_at_5 | |
value: 14.338000000000001 | |
- type: precision_at_1 | |
value: 10.914 | |
- type: precision_at_10 | |
value: 2.91 | |
- type: precision_at_100 | |
value: 0.601 | |
- type: precision_at_1000 | |
value: 0.10200000000000001 | |
- type: precision_at_3 | |
value: 6.311999999999999 | |
- type: precision_at_5 | |
value: 4.683 | |
- type: recall_at_1 | |
value: 8.894 | |
- type: recall_at_10 | |
value: 21.45 | |
- type: recall_at_100 | |
value: 42.617 | |
- type: recall_at_1000 | |
value: 69.233 | |
- type: recall_at_3 | |
value: 14.52 | |
- type: recall_at_5 | |
value: 17.681 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/webmasters | |
name: MTEB CQADupstackWebmastersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 12.158 | |
- type: map_at_10 | |
value: 16.332 | |
- type: map_at_100 | |
value: 17.458000000000002 | |
- type: map_at_1000 | |
value: 17.687 | |
- type: map_at_3 | |
value: 14.529 | |
- type: map_at_5 | |
value: 15.515 | |
- type: mrr_at_1 | |
value: 15.809999999999999 | |
- type: mrr_at_10 | |
value: 19.917 | |
- type: mrr_at_100 | |
value: 20.875 | |
- type: mrr_at_1000 | |
value: 20.985 | |
- type: mrr_at_3 | |
value: 18.116 | |
- type: mrr_at_5 | |
value: 19.025 | |
- type: ndcg_at_1 | |
value: 15.809999999999999 | |
- type: ndcg_at_10 | |
value: 19.869999999999997 | |
- type: ndcg_at_100 | |
value: 24.907 | |
- type: ndcg_at_1000 | |
value: 29.076999999999998 | |
- type: ndcg_at_3 | |
value: 16.899 | |
- type: ndcg_at_5 | |
value: 18.23 | |
- type: precision_at_1 | |
value: 15.809999999999999 | |
- type: precision_at_10 | |
value: 3.972 | |
- type: precision_at_100 | |
value: 0.9860000000000001 | |
- type: precision_at_1000 | |
value: 0.203 | |
- type: precision_at_3 | |
value: 8.169 | |
- type: precision_at_5 | |
value: 6.087 | |
- type: recall_at_1 | |
value: 12.158 | |
- type: recall_at_10 | |
value: 26.338 | |
- type: recall_at_100 | |
value: 49.845 | |
- type: recall_at_1000 | |
value: 78.82000000000001 | |
- type: recall_at_3 | |
value: 16.997 | |
- type: recall_at_5 | |
value: 20.848 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/wordpress | |
name: MTEB CQADupstackWordpressRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 8.01 | |
- type: map_at_10 | |
value: 10.889 | |
- type: map_at_100 | |
value: 11.562 | |
- type: map_at_1000 | |
value: 11.65 | |
- type: map_at_3 | |
value: 9.718 | |
- type: map_at_5 | |
value: 10.358 | |
- type: mrr_at_1 | |
value: 8.688 | |
- type: mrr_at_10 | |
value: 11.862 | |
- type: mrr_at_100 | |
value: 12.558 | |
- type: mrr_at_1000 | |
value: 12.642000000000001 | |
- type: mrr_at_3 | |
value: 10.598 | |
- type: mrr_at_5 | |
value: 11.328000000000001 | |
- type: ndcg_at_1 | |
value: 8.688 | |
- type: ndcg_at_10 | |
value: 12.959999999999999 | |
- type: ndcg_at_100 | |
value: 16.744 | |
- type: ndcg_at_1000 | |
value: 19.564999999999998 | |
- type: ndcg_at_3 | |
value: 10.476 | |
- type: ndcg_at_5 | |
value: 11.639 | |
- type: precision_at_1 | |
value: 8.688 | |
- type: precision_at_10 | |
value: 2.089 | |
- type: precision_at_100 | |
value: 0.43299999999999994 | |
- type: precision_at_1000 | |
value: 0.07200000000000001 | |
- type: precision_at_3 | |
value: 4.375 | |
- type: precision_at_5 | |
value: 3.253 | |
- type: recall_at_1 | |
value: 8.01 | |
- type: recall_at_10 | |
value: 18.589 | |
- type: recall_at_100 | |
value: 36.857 | |
- type: recall_at_1000 | |
value: 59.047000000000004 | |
- type: recall_at_3 | |
value: 11.774 | |
- type: recall_at_5 | |
value: 14.516000000000002 | |
- task: | |
type: Retrieval | |
dataset: | |
type: climate-fever | |
name: MTEB ClimateFEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 6.4719999999999995 | |
- type: map_at_10 | |
value: 12.322 | |
- type: map_at_100 | |
value: 14.122000000000002 | |
- type: map_at_1000 | |
value: 14.35 | |
- type: map_at_3 | |
value: 9.667 | |
- type: map_at_5 | |
value: 10.931000000000001 | |
- type: mrr_at_1 | |
value: 15.179 | |
- type: mrr_at_10 | |
value: 24.864 | |
- type: mrr_at_100 | |
value: 26.144000000000002 | |
- type: mrr_at_1000 | |
value: 26.198 | |
- type: mrr_at_3 | |
value: 20.999000000000002 | |
- type: mrr_at_5 | |
value: 23.097 | |
- type: ndcg_at_1 | |
value: 15.179 | |
- type: ndcg_at_10 | |
value: 18.951999999999998 | |
- type: ndcg_at_100 | |
value: 26.924 | |
- type: ndcg_at_1000 | |
value: 30.991999999999997 | |
- type: ndcg_at_3 | |
value: 13.778000000000002 | |
- type: ndcg_at_5 | |
value: 15.549 | |
- type: precision_at_1 | |
value: 15.179 | |
- type: precision_at_10 | |
value: 6.625 | |
- type: precision_at_100 | |
value: 1.516 | |
- type: precision_at_1000 | |
value: 0.22599999999999998 | |
- type: precision_at_3 | |
value: 10.51 | |
- type: precision_at_5 | |
value: 8.847 | |
- type: recall_at_1 | |
value: 6.4719999999999995 | |
- type: recall_at_10 | |
value: 25.191999999999997 | |
- type: recall_at_100 | |
value: 53.315 | |
- type: recall_at_1000 | |
value: 76.163 | |
- type: recall_at_3 | |
value: 12.834999999999999 | |
- type: recall_at_5 | |
value: 17.388 | |
- task: | |
type: Retrieval | |
dataset: | |
type: dbpedia-entity | |
name: MTEB DBPedia | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 1.947 | |
- type: map_at_10 | |
value: 4.858 | |
- type: map_at_100 | |
value: 7.185999999999999 | |
- type: map_at_1000 | |
value: 7.931000000000001 | |
- type: map_at_3 | |
value: 3.2939999999999996 | |
- type: map_at_5 | |
value: 3.914 | |
- type: mrr_at_1 | |
value: 23.25 | |
- type: mrr_at_10 | |
value: 33.035 | |
- type: mrr_at_100 | |
value: 33.721000000000004 | |
- type: mrr_at_1000 | |
value: 33.789 | |
- type: mrr_at_3 | |
value: 29.75 | |
- type: mrr_at_5 | |
value: 31.738 | |
- type: ndcg_at_1 | |
value: 15.625 | |
- type: ndcg_at_10 | |
value: 13.211999999999998 | |
- type: ndcg_at_100 | |
value: 16.422 | |
- type: ndcg_at_1000 | |
value: 23.058999999999997 | |
- type: ndcg_at_3 | |
value: 14.573 | |
- type: ndcg_at_5 | |
value: 13.733999999999998 | |
- type: precision_at_1 | |
value: 23.25 | |
- type: precision_at_10 | |
value: 12.45 | |
- type: precision_at_100 | |
value: 4.192 | |
- type: precision_at_1000 | |
value: 1.083 | |
- type: precision_at_3 | |
value: 18.667 | |
- type: precision_at_5 | |
value: 15.950000000000001 | |
- type: recall_at_1 | |
value: 1.947 | |
- type: recall_at_10 | |
value: 9.317 | |
- type: recall_at_100 | |
value: 23.066 | |
- type: recall_at_1000 | |
value: 45.704 | |
- type: recall_at_3 | |
value: 4.12 | |
- type: recall_at_5 | |
value: 5.591 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/emotion | |
name: MTEB EmotionClassification | |
config: default | |
split: test | |
revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 | |
metrics: | |
- type: accuracy | |
value: 42.855 | |
- type: f1 | |
value: 39.029787102377576 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fever | |
name: MTEB FEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 8.461 | |
- type: map_at_10 | |
value: 13.655999999999999 | |
- type: map_at_100 | |
value: 14.499 | |
- type: map_at_1000 | |
value: 14.585999999999999 | |
- type: map_at_3 | |
value: 11.848 | |
- type: map_at_5 | |
value: 12.842999999999998 | |
- type: mrr_at_1 | |
value: 9.136 | |
- type: mrr_at_10 | |
value: 14.587 | |
- type: mrr_at_100 | |
value: 15.436 | |
- type: mrr_at_1000 | |
value: 15.518 | |
- type: mrr_at_3 | |
value: 12.690999999999999 | |
- type: mrr_at_5 | |
value: 13.747000000000002 | |
- type: ndcg_at_1 | |
value: 9.136 | |
- type: ndcg_at_10 | |
value: 16.958000000000002 | |
- type: ndcg_at_100 | |
value: 21.43 | |
- type: ndcg_at_1000 | |
value: 24.031 | |
- type: ndcg_at_3 | |
value: 13.191 | |
- type: ndcg_at_5 | |
value: 14.987 | |
- type: precision_at_1 | |
value: 9.136 | |
- type: precision_at_10 | |
value: 2.897 | |
- type: precision_at_100 | |
value: 0.532 | |
- type: precision_at_1000 | |
value: 0.077 | |
- type: precision_at_3 | |
value: 5.8709999999999996 | |
- type: precision_at_5 | |
value: 4.47 | |
- type: recall_at_1 | |
value: 8.461 | |
- type: recall_at_10 | |
value: 26.509 | |
- type: recall_at_100 | |
value: 47.776 | |
- type: recall_at_1000 | |
value: 68.26299999999999 | |
- type: recall_at_3 | |
value: 16.203 | |
- type: recall_at_5 | |
value: 20.505000000000003 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fiqa | |
name: MTEB FiQA2018 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 7.396 | |
- type: map_at_10 | |
value: 12.393 | |
- type: map_at_100 | |
value: 13.857 | |
- type: map_at_1000 | |
value: 14.086000000000002 | |
- type: map_at_3 | |
value: 10.545 | |
- type: map_at_5 | |
value: 11.505 | |
- type: mrr_at_1 | |
value: 15.432000000000002 | |
- type: mrr_at_10 | |
value: 21.615000000000002 | |
- type: mrr_at_100 | |
value: 22.833000000000002 | |
- type: mrr_at_1000 | |
value: 22.931 | |
- type: mrr_at_3 | |
value: 19.522000000000002 | |
- type: mrr_at_5 | |
value: 20.663999999999998 | |
- type: ndcg_at_1 | |
value: 15.432000000000002 | |
- type: ndcg_at_10 | |
value: 16.986 | |
- type: ndcg_at_100 | |
value: 23.880000000000003 | |
- type: ndcg_at_1000 | |
value: 28.762999999999998 | |
- type: ndcg_at_3 | |
value: 14.482999999999999 | |
- type: ndcg_at_5 | |
value: 15.334999999999999 | |
- type: precision_at_1 | |
value: 15.432000000000002 | |
- type: precision_at_10 | |
value: 4.984999999999999 | |
- type: precision_at_100 | |
value: 1.167 | |
- type: precision_at_1000 | |
value: 0.2 | |
- type: precision_at_3 | |
value: 9.825000000000001 | |
- type: precision_at_5 | |
value: 7.469 | |
- type: recall_at_1 | |
value: 7.396 | |
- type: recall_at_10 | |
value: 21.389 | |
- type: recall_at_100 | |
value: 48.107 | |
- type: recall_at_1000 | |
value: 78.366 | |
- type: recall_at_3 | |
value: 13.181000000000001 | |
- type: recall_at_5 | |
value: 16.611 | |
- task: | |
type: Retrieval | |
dataset: | |
type: hotpotqa | |
name: MTEB HotpotQA | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 11.884 | |
- type: map_at_10 | |
value: 17.09 | |
- type: map_at_100 | |
value: 17.96 | |
- type: map_at_1000 | |
value: 18.081 | |
- type: map_at_3 | |
value: 15.296000000000001 | |
- type: map_at_5 | |
value: 16.289 | |
- type: mrr_at_1 | |
value: 23.768 | |
- type: mrr_at_10 | |
value: 29.991 | |
- type: mrr_at_100 | |
value: 30.862000000000002 | |
- type: mrr_at_1000 | |
value: 30.935000000000002 | |
- type: mrr_at_3 | |
value: 27.986 | |
- type: mrr_at_5 | |
value: 29.078 | |
- type: ndcg_at_1 | |
value: 23.768 | |
- type: ndcg_at_10 | |
value: 22.634999999999998 | |
- type: ndcg_at_100 | |
value: 27.059 | |
- type: ndcg_at_1000 | |
value: 30.145 | |
- type: ndcg_at_3 | |
value: 19.058 | |
- type: ndcg_at_5 | |
value: 20.762 | |
- type: precision_at_1 | |
value: 23.768 | |
- type: precision_at_10 | |
value: 5.2490000000000006 | |
- type: precision_at_100 | |
value: 0.8829999999999999 | |
- type: precision_at_1000 | |
value: 0.13 | |
- type: precision_at_3 | |
value: 12.091000000000001 | |
- type: precision_at_5 | |
value: 8.605 | |
- type: recall_at_1 | |
value: 11.884 | |
- type: recall_at_10 | |
value: 26.246000000000002 | |
- type: recall_at_100 | |
value: 44.153 | |
- type: recall_at_1000 | |
value: 64.889 | |
- type: recall_at_3 | |
value: 18.136 | |
- type: recall_at_5 | |
value: 21.512 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/imdb | |
name: MTEB ImdbClassification | |
config: default | |
split: test | |
revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 | |
metrics: | |
- type: accuracy | |
value: 71.9232 | |
- type: ap | |
value: 66.56619827391917 | |
- type: f1 | |
value: 71.60536244284128 | |
- task: | |
type: Retrieval | |
dataset: | |
type: msmarco | |
name: MTEB MSMARCO | |
config: default | |
split: dev | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 3.037 | |
- type: map_at_10 | |
value: 5.414 | |
- type: map_at_100 | |
value: 6.072 | |
- type: map_at_1000 | |
value: 6.172 | |
- type: map_at_3 | |
value: 4.437 | |
- type: map_at_5 | |
value: 4.939 | |
- type: mrr_at_1 | |
value: 3.123 | |
- type: mrr_at_10 | |
value: 5.572 | |
- type: mrr_at_100 | |
value: 6.235 | |
- type: mrr_at_1000 | |
value: 6.334 | |
- type: mrr_at_3 | |
value: 4.563 | |
- type: mrr_at_5 | |
value: 5.09 | |
- type: ndcg_at_1 | |
value: 3.123 | |
- type: ndcg_at_10 | |
value: 7.027 | |
- type: ndcg_at_100 | |
value: 10.776 | |
- type: ndcg_at_1000 | |
value: 13.904 | |
- type: ndcg_at_3 | |
value: 4.95 | |
- type: ndcg_at_5 | |
value: 5.865 | |
- type: precision_at_1 | |
value: 3.123 | |
- type: precision_at_10 | |
value: 1.252 | |
- type: precision_at_100 | |
value: 0.32299999999999995 | |
- type: precision_at_1000 | |
value: 0.059000000000000004 | |
- type: precision_at_3 | |
value: 2.168 | |
- type: precision_at_5 | |
value: 1.7680000000000002 | |
- type: recall_at_1 | |
value: 3.037 | |
- type: recall_at_10 | |
value: 12.11 | |
- type: recall_at_100 | |
value: 30.714999999999996 | |
- type: recall_at_1000 | |
value: 56.006 | |
- type: recall_at_3 | |
value: 6.3229999999999995 | |
- type: recall_at_5 | |
value: 8.518 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_domain | |
name: MTEB MTOPDomainClassification (en) | |
config: en | |
split: test | |
revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf | |
metrics: | |
- type: accuracy | |
value: 91.24259005927954 | |
- type: f1 | |
value: 90.7594022786747 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_intent | |
name: MTEB MTOPIntentClassification (en) | |
config: en | |
split: test | |
revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba | |
metrics: | |
- type: accuracy | |
value: 74.08344733242134 | |
- type: f1 | |
value: 52.377556461789055 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_intent | |
name: MTEB MassiveIntentClassification (en) | |
config: en | |
split: test | |
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 | |
metrics: | |
- type: accuracy | |
value: 69.99327505043712 | |
- type: f1 | |
value: 66.15141376479805 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_scenario | |
name: MTEB MassiveScenarioClassification (en) | |
config: en | |
split: test | |
revision: 7d571f92784cd94a019292a1f45445077d0ef634 | |
metrics: | |
- type: accuracy | |
value: 75.1546738399462 | |
- type: f1 | |
value: 74.83013584700711 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-p2p | |
name: MTEB MedrxivClusteringP2P | |
config: default | |
split: test | |
revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 | |
metrics: | |
- type: v_measure | |
value: 30.146364191412356 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-s2s | |
name: MTEB MedrxivClusteringS2S | |
config: default | |
split: test | |
revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 | |
metrics: | |
- type: v_measure | |
value: 26.96347584990607 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/mind_small | |
name: MTEB MindSmallReranking | |
config: default | |
split: test | |
revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 | |
metrics: | |
- type: map | |
value: 29.520993847103533 | |
- type: mrr | |
value: 30.402007095845374 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nfcorpus | |
name: MTEB NFCorpus | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 1.72 | |
- type: map_at_10 | |
value: 4.041 | |
- type: map_at_100 | |
value: 5.356000000000001 | |
- type: map_at_1000 | |
value: 6.413 | |
- type: map_at_3 | |
value: 2.9770000000000003 | |
- type: map_at_5 | |
value: 3.3689999999999998 | |
- type: mrr_at_1 | |
value: 21.981 | |
- type: mrr_at_10 | |
value: 30.286 | |
- type: mrr_at_100 | |
value: 31.272 | |
- type: mrr_at_1000 | |
value: 31.347 | |
- type: mrr_at_3 | |
value: 27.193 | |
- type: mrr_at_5 | |
value: 28.694999999999997 | |
- type: ndcg_at_1 | |
value: 19.814 | |
- type: ndcg_at_10 | |
value: 15.732 | |
- type: ndcg_at_100 | |
value: 16.033 | |
- type: ndcg_at_1000 | |
value: 25.865 | |
- type: ndcg_at_3 | |
value: 17.944 | |
- type: ndcg_at_5 | |
value: 16.634 | |
- type: precision_at_1 | |
value: 21.981 | |
- type: precision_at_10 | |
value: 12.786 | |
- type: precision_at_100 | |
value: 4.83 | |
- type: precision_at_1000 | |
value: 1.765 | |
- type: precision_at_3 | |
value: 17.75 | |
- type: precision_at_5 | |
value: 15.232000000000001 | |
- type: recall_at_1 | |
value: 1.72 | |
- type: recall_at_10 | |
value: 7.436 | |
- type: recall_at_100 | |
value: 20.275000000000002 | |
- type: recall_at_1000 | |
value: 54.19500000000001 | |
- type: recall_at_3 | |
value: 3.787 | |
- type: recall_at_5 | |
value: 4.829 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nq | |
name: MTEB NQ | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 7.964 | |
- type: map_at_10 | |
value: 14.025000000000002 | |
- type: map_at_100 | |
value: 15.222 | |
- type: map_at_1000 | |
value: 15.32 | |
- type: map_at_3 | |
value: 11.886 | |
- type: map_at_5 | |
value: 13.056999999999999 | |
- type: mrr_at_1 | |
value: 9.183 | |
- type: mrr_at_10 | |
value: 15.651000000000002 | |
- type: mrr_at_100 | |
value: 16.753999999999998 | |
- type: mrr_at_1000 | |
value: 16.833000000000002 | |
- type: mrr_at_3 | |
value: 13.437 | |
- type: mrr_at_5 | |
value: 14.69 | |
- type: ndcg_at_1 | |
value: 9.183 | |
- type: ndcg_at_10 | |
value: 17.96 | |
- type: ndcg_at_100 | |
value: 23.823 | |
- type: ndcg_at_1000 | |
value: 26.461000000000002 | |
- type: ndcg_at_3 | |
value: 13.536999999999999 | |
- type: ndcg_at_5 | |
value: 15.642 | |
- type: precision_at_1 | |
value: 9.183 | |
- type: precision_at_10 | |
value: 3.366 | |
- type: precision_at_100 | |
value: 0.67 | |
- type: precision_at_1000 | |
value: 0.092 | |
- type: precision_at_3 | |
value: 6.547 | |
- type: precision_at_5 | |
value: 5.098 | |
- type: recall_at_1 | |
value: 7.964 | |
- type: recall_at_10 | |
value: 28.599000000000004 | |
- type: recall_at_100 | |
value: 55.381 | |
- type: recall_at_1000 | |
value: 75.63 | |
- type: recall_at_3 | |
value: 16.77 | |
- type: recall_at_5 | |
value: 21.671000000000003 | |
- task: | |
type: Retrieval | |
dataset: | |
type: quora | |
name: MTEB QuoraRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 59.846999999999994 | |
- type: map_at_10 | |
value: 73.18599999999999 | |
- type: map_at_100 | |
value: 74.055 | |
- type: map_at_1000 | |
value: 74.09 | |
- type: map_at_3 | |
value: 69.95700000000001 | |
- type: map_at_5 | |
value: 71.925 | |
- type: mrr_at_1 | |
value: 69.0 | |
- type: mrr_at_10 | |
value: 77.23299999999999 | |
- type: mrr_at_100 | |
value: 77.52 | |
- type: mrr_at_1000 | |
value: 77.526 | |
- type: mrr_at_3 | |
value: 75.59 | |
- type: mrr_at_5 | |
value: 76.63799999999999 | |
- type: ndcg_at_1 | |
value: 69.02000000000001 | |
- type: ndcg_at_10 | |
value: 78.226 | |
- type: ndcg_at_100 | |
value: 80.60199999999999 | |
- type: ndcg_at_1000 | |
value: 80.971 | |
- type: ndcg_at_3 | |
value: 74.124 | |
- type: ndcg_at_5 | |
value: 76.265 | |
- type: precision_at_1 | |
value: 69.02000000000001 | |
- type: precision_at_10 | |
value: 12.102 | |
- type: precision_at_100 | |
value: 1.468 | |
- type: precision_at_1000 | |
value: 0.155 | |
- type: precision_at_3 | |
value: 32.5 | |
- type: precision_at_5 | |
value: 21.7 | |
- type: recall_at_1 | |
value: 59.846999999999994 | |
- type: recall_at_10 | |
value: 88.485 | |
- type: recall_at_100 | |
value: 97.425 | |
- type: recall_at_1000 | |
value: 99.523 | |
- type: recall_at_3 | |
value: 77.051 | |
- type: recall_at_5 | |
value: 82.762 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering | |
name: MTEB RedditClustering | |
config: default | |
split: test | |
revision: 24640382cdbf8abc73003fb0fa6d111a705499eb | |
metrics: | |
- type: v_measure | |
value: 38.67296729610079 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering-p2p | |
name: MTEB RedditClusteringP2P | |
config: default | |
split: test | |
revision: 282350215ef01743dc01b456c7f5241fa8937f16 | |
metrics: | |
- type: v_measure | |
value: 53.42017351823769 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scidocs | |
name: MTEB SCIDOCS | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 0.893 | |
- type: map_at_10 | |
value: 2.804 | |
- type: map_at_100 | |
value: 3.6740000000000004 | |
- type: map_at_1000 | |
value: 3.94 | |
- type: map_at_3 | |
value: 1.926 | |
- type: map_at_5 | |
value: 2.363 | |
- type: mrr_at_1 | |
value: 4.3 | |
- type: mrr_at_10 | |
value: 9.520000000000001 | |
- type: mrr_at_100 | |
value: 10.692 | |
- type: mrr_at_1000 | |
value: 10.841000000000001 | |
- type: mrr_at_3 | |
value: 7.6 | |
- type: mrr_at_5 | |
value: 8.63 | |
- type: ndcg_at_1 | |
value: 4.3 | |
- type: ndcg_at_10 | |
value: 5.531 | |
- type: ndcg_at_100 | |
value: 10.512 | |
- type: ndcg_at_1000 | |
value: 16.683 | |
- type: ndcg_at_3 | |
value: 4.632 | |
- type: ndcg_at_5 | |
value: 4.3229999999999995 | |
- type: precision_at_1 | |
value: 4.3 | |
- type: precision_at_10 | |
value: 3.16 | |
- type: precision_at_100 | |
value: 1.065 | |
- type: precision_at_1000 | |
value: 0.256 | |
- type: precision_at_3 | |
value: 4.667000000000001 | |
- type: precision_at_5 | |
value: 4.1000000000000005 | |
- type: recall_at_1 | |
value: 0.893 | |
- type: recall_at_10 | |
value: 6.428000000000001 | |
- type: recall_at_100 | |
value: 21.662 | |
- type: recall_at_1000 | |
value: 52.162 | |
- type: recall_at_3 | |
value: 2.868 | |
- type: recall_at_5 | |
value: 4.188 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sickr-sts | |
name: MTEB SICK-R | |
config: default | |
split: test | |
revision: a6ea5a8cab320b040a23452cc28066d9beae2cee | |
metrics: | |
- type: cos_sim_spearman | |
value: 69.34396953516386 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts12-sts | |
name: MTEB STS12 | |
config: default | |
split: test | |
revision: a0d554a64d88156834ff5ae9920b964011b16384 | |
metrics: | |
- type: cos_sim_spearman | |
value: 60.094374065360746 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts13-sts | |
name: MTEB STS13 | |
config: default | |
split: test | |
revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca | |
metrics: | |
- type: cos_sim_spearman | |
value: 72.51503781013379 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts14-sts | |
name: MTEB STS14 | |
config: default | |
split: test | |
revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 | |
metrics: | |
- type: cos_sim_spearman | |
value: 66.6954698644186 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts15-sts | |
name: MTEB STS15 | |
config: default | |
split: test | |
revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 | |
metrics: | |
- type: cos_sim_spearman | |
value: 77.69462578028768 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts16-sts | |
name: MTEB STS16 | |
config: default | |
split: test | |
revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 | |
metrics: | |
- type: cos_sim_spearman | |
value: 75.9397626457859 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts17-crosslingual-sts | |
name: MTEB STS17 (en-en) | |
config: en-en | |
split: test | |
revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d | |
metrics: | |
- type: cos_sim_spearman | |
value: 81.67242768943406 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts22-crosslingual-sts | |
name: MTEB STS22 (en) | |
config: en | |
split: test | |
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 | |
metrics: | |
- type: cos_sim_spearman | |
value: 63.7027324700292 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/stsbenchmark-sts | |
name: MTEB STSBenchmark | |
config: default | |
split: test | |
revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 | |
metrics: | |
- type: cos_sim_spearman | |
value: 73.36074244064153 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/scidocs-reranking | |
name: MTEB SciDocsRR | |
config: default | |
split: test | |
revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab | |
metrics: | |
- type: map | |
value: 67.75984402370518 | |
- type: mrr | |
value: 86.9951798383171 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scifact | |
name: MTEB SciFact | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 24.583 | |
- type: map_at_10 | |
value: 33.125 | |
- type: map_at_100 | |
value: 34.14 | |
- type: map_at_1000 | |
value: 34.22 | |
- type: map_at_3 | |
value: 29.616 | |
- type: map_at_5 | |
value: 31.896 | |
- type: mrr_at_1 | |
value: 26.333000000000002 | |
- type: mrr_at_10 | |
value: 34.437 | |
- type: mrr_at_100 | |
value: 35.363 | |
- type: mrr_at_1000 | |
value: 35.433 | |
- type: mrr_at_3 | |
value: 31.333 | |
- type: mrr_at_5 | |
value: 33.267 | |
- type: ndcg_at_1 | |
value: 26.333000000000002 | |
- type: ndcg_at_10 | |
value: 38.311 | |
- type: ndcg_at_100 | |
value: 43.923 | |
- type: ndcg_at_1000 | |
value: 45.923 | |
- type: ndcg_at_3 | |
value: 31.596000000000004 | |
- type: ndcg_at_5 | |
value: 35.448 | |
- type: precision_at_1 | |
value: 26.333000000000002 | |
- type: precision_at_10 | |
value: 5.933 | |
- type: precision_at_100 | |
value: 0.91 | |
- type: precision_at_1000 | |
value: 0.109 | |
- type: precision_at_3 | |
value: 13.0 | |
- type: precision_at_5 | |
value: 9.933 | |
- type: recall_at_1 | |
value: 24.583 | |
- type: recall_at_10 | |
value: 53.417 | |
- type: recall_at_100 | |
value: 80.989 | |
- type: recall_at_1000 | |
value: 96.322 | |
- type: recall_at_3 | |
value: 35.611 | |
- type: recall_at_5 | |
value: 44.833 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/sprintduplicatequestions-pairclassification | |
name: MTEB SprintDuplicateQuestions | |
config: default | |
split: test | |
revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 99.48514851485149 | |
- type: cos_sim_ap | |
value: 77.36426466374054 | |
- type: cos_sim_f1 | |
value: 72.0702116675271 | |
- type: cos_sim_precision | |
value: 74.49306296691569 | |
- type: cos_sim_recall | |
value: 69.8 | |
- type: dot_accuracy | |
value: 99.15049504950495 | |
- type: dot_ap | |
value: 46.792474140260715 | |
- type: dot_f1 | |
value: 48.76476906552094 | |
- type: dot_precision | |
value: 52.66821345707656 | |
- type: dot_recall | |
value: 45.4 | |
- type: euclidean_accuracy | |
value: 99.46534653465346 | |
- type: euclidean_ap | |
value: 74.1978837990589 | |
- type: euclidean_f1 | |
value: 69.47256259989345 | |
- type: euclidean_precision | |
value: 74.34435575826683 | |
- type: euclidean_recall | |
value: 65.2 | |
- type: manhattan_accuracy | |
value: 99.47128712871287 | |
- type: manhattan_ap | |
value: 75.31910551743364 | |
- type: manhattan_f1 | |
value: 70.1582105837425 | |
- type: manhattan_precision | |
value: 77.19087635054022 | |
- type: manhattan_recall | |
value: 64.3 | |
- type: max_accuracy | |
value: 99.48514851485149 | |
- type: max_ap | |
value: 77.36426466374054 | |
- type: max_f1 | |
value: 72.0702116675271 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering | |
name: MTEB StackExchangeClustering | |
config: default | |
split: test | |
revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 | |
metrics: | |
- type: v_measure | |
value: 59.353792480720436 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering-p2p | |
name: MTEB StackExchangeClusteringP2P | |
config: default | |
split: test | |
revision: 815ca46b2622cec33ccafc3735d572c266efdb44 | |
metrics: | |
- type: v_measure | |
value: 31.474896484744836 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/stackoverflowdupquestions-reranking | |
name: MTEB StackOverflowDupQuestions | |
config: default | |
split: test | |
revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 | |
metrics: | |
- type: map | |
value: 40.82378653430986 | |
- type: mrr | |
value: 41.13905600118835 | |
- task: | |
type: Summarization | |
dataset: | |
type: mteb/summeval | |
name: MTEB SummEval | |
config: default | |
split: test | |
revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c | |
metrics: | |
- type: cos_sim_pearson | |
value: 31.08154836998798 | |
- type: cos_sim_spearman | |
value: 31.232033308845907 | |
- type: dot_pearson | |
value: 23.767593496465828 | |
- type: dot_spearman | |
value: 25.6201612766572 | |
- task: | |
type: Retrieval | |
dataset: | |
type: trec-covid | |
name: MTEB TRECCOVID | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 0.186 | |
- type: map_at_10 | |
value: 1.1809999999999998 | |
- type: map_at_100 | |
value: 5.21 | |
- type: map_at_1000 | |
value: 12.447999999999999 | |
- type: map_at_3 | |
value: 0.44200000000000006 | |
- type: map_at_5 | |
value: 0.673 | |
- type: mrr_at_1 | |
value: 72.0 | |
- type: mrr_at_10 | |
value: 80.01899999999999 | |
- type: mrr_at_100 | |
value: 80.42099999999999 | |
- type: mrr_at_1000 | |
value: 80.42099999999999 | |
- type: mrr_at_3 | |
value: 78.0 | |
- type: mrr_at_5 | |
value: 79.4 | |
- type: ndcg_at_1 | |
value: 66.0 | |
- type: ndcg_at_10 | |
value: 56.041 | |
- type: ndcg_at_100 | |
value: 37.987 | |
- type: ndcg_at_1000 | |
value: 34.198 | |
- type: ndcg_at_3 | |
value: 60.23500000000001 | |
- type: ndcg_at_5 | |
value: 58.025999999999996 | |
- type: precision_at_1 | |
value: 72.0 | |
- type: precision_at_10 | |
value: 60.4 | |
- type: precision_at_100 | |
value: 38.940000000000005 | |
- type: precision_at_1000 | |
value: 16.106 | |
- type: precision_at_3 | |
value: 63.333 | |
- type: precision_at_5 | |
value: 61.6 | |
- type: recall_at_1 | |
value: 0.186 | |
- type: recall_at_10 | |
value: 1.458 | |
- type: recall_at_100 | |
value: 8.455 | |
- type: recall_at_1000 | |
value: 33.141999999999996 | |
- type: recall_at_3 | |
value: 0.461 | |
- type: recall_at_5 | |
value: 0.756 | |
- task: | |
type: Retrieval | |
dataset: | |
type: webis-touche2020 | |
name: MTEB Touche2020 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 2.2849999999999997 | |
- type: map_at_10 | |
value: 6.909 | |
- type: map_at_100 | |
value: 11.231 | |
- type: map_at_1000 | |
value: 12.472 | |
- type: map_at_3 | |
value: 3.53 | |
- type: map_at_5 | |
value: 4.675 | |
- type: mrr_at_1 | |
value: 26.531 | |
- type: mrr_at_10 | |
value: 40.73 | |
- type: mrr_at_100 | |
value: 41.637 | |
- type: mrr_at_1000 | |
value: 41.647 | |
- type: mrr_at_3 | |
value: 34.354 | |
- type: mrr_at_5 | |
value: 38.741 | |
- type: ndcg_at_1 | |
value: 24.490000000000002 | |
- type: ndcg_at_10 | |
value: 19.17 | |
- type: ndcg_at_100 | |
value: 29.946 | |
- type: ndcg_at_1000 | |
value: 40.842 | |
- type: ndcg_at_3 | |
value: 19.088 | |
- type: ndcg_at_5 | |
value: 19.445999999999998 | |
- type: precision_at_1 | |
value: 26.531 | |
- type: precision_at_10 | |
value: 17.959 | |
- type: precision_at_100 | |
value: 6.468999999999999 | |
- type: precision_at_1000 | |
value: 1.351 | |
- type: precision_at_3 | |
value: 19.048000000000002 | |
- type: precision_at_5 | |
value: 19.592000000000002 | |
- type: recall_at_1 | |
value: 2.2849999999999997 | |
- type: recall_at_10 | |
value: 12.973 | |
- type: recall_at_100 | |
value: 40.239999999999995 | |
- type: recall_at_1000 | |
value: 73.247 | |
- type: recall_at_3 | |
value: 4.407 | |
- type: recall_at_5 | |
value: 6.908 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/toxic_conversations_50k | |
name: MTEB ToxicConversationsClassification | |
config: default | |
split: test | |
revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c | |
metrics: | |
- type: accuracy | |
value: 68.405 | |
- type: ap | |
value: 13.9913678628558 | |
- type: f1 | |
value: 53.209691917560285 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/tweet_sentiment_extraction | |
name: MTEB TweetSentimentExtractionClassification | |
config: default | |
split: test | |
revision: d604517c81ca91fe16a244d1248fc021f9ecee7a | |
metrics: | |
- type: accuracy | |
value: 56.080928126768534 | |
- type: f1 | |
value: 56.36329965117965 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/twentynewsgroups-clustering | |
name: MTEB TwentyNewsgroupsClustering | |
config: default | |
split: test | |
revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 | |
metrics: | |
- type: v_measure | |
value: 31.540976715818065 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twittersemeval2015-pairclassification | |
name: MTEB TwitterSemEval2015 | |
config: default | |
split: test | |
revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 82.90516778923526 | |
- type: cos_sim_ap | |
value: 61.5394989621502 | |
- type: cos_sim_f1 | |
value: 58.02297689685646 | |
- type: cos_sim_precision | |
value: 55.62817719680465 | |
- type: cos_sim_recall | |
value: 60.633245382585756 | |
- type: dot_accuracy | |
value: 78.95928950348691 | |
- type: dot_ap | |
value: 48.61088896690895 | |
- type: dot_f1 | |
value: 51.0104674059488 | |
- type: dot_precision | |
value: 42.00375490698071 | |
- type: dot_recall | |
value: 64.93403693931398 | |
- type: euclidean_accuracy | |
value: 82.476008821601 | |
- type: euclidean_ap | |
value: 59.59406971314053 | |
- type: euclidean_f1 | |
value: 56.424962447084525 | |
- type: euclidean_precision | |
value: 58.47721483158789 | |
- type: euclidean_recall | |
value: 54.51187335092348 | |
- type: manhattan_accuracy | |
value: 82.66078559933241 | |
- type: manhattan_ap | |
value: 60.414321716856925 | |
- type: manhattan_f1 | |
value: 56.88221089348002 | |
- type: manhattan_precision | |
value: 57.86026200873362 | |
- type: manhattan_recall | |
value: 55.93667546174142 | |
- type: max_accuracy | |
value: 82.90516778923526 | |
- type: max_ap | |
value: 61.5394989621502 | |
- type: max_f1 | |
value: 58.02297689685646 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twitterurlcorpus-pairclassification | |
name: MTEB TwitterURLCorpus | |
config: default | |
split: test | |
revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf | |
metrics: | |
- type: cos_sim_accuracy | |
value: 85.71622618077386 | |
- type: cos_sim_ap | |
value: 77.72774861009667 | |
- type: cos_sim_f1 | |
value: 71.40275165062152 | |
- type: cos_sim_precision | |
value: 68.53359767754726 | |
- type: cos_sim_recall | |
value: 74.52263627964275 | |
- type: dot_accuracy | |
value: 83.97174680793262 | |
- type: dot_ap | |
value: 72.89480417427734 | |
- type: dot_f1 | |
value: 68.57803792366198 | |
- type: dot_precision | |
value: 62.94151708164447 | |
- type: dot_recall | |
value: 75.32337542346782 | |
- type: euclidean_accuracy | |
value: 84.88570652384834 | |
- type: euclidean_ap | |
value: 75.78371710915128 | |
- type: euclidean_f1 | |
value: 69.44268877569989 | |
- type: euclidean_precision | |
value: 67.1435761018046 | |
- type: euclidean_recall | |
value: 71.90483523252233 | |
- type: manhattan_accuracy | |
value: 85.6114409904141 | |
- type: manhattan_ap | |
value: 77.38579436755944 | |
- type: manhattan_f1 | |
value: 70.8608538430316 | |
- type: manhattan_precision | |
value: 68.03656203500319 | |
- type: manhattan_recall | |
value: 73.92978133661842 | |
- type: max_accuracy | |
value: 85.71622618077386 | |
- type: max_ap | |
value: 77.72774861009667 | |
- type: max_f1 | |
value: 71.40275165062152 | |
# LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders | |
> LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance. | |
- **Repository:** https://github.com/McGill-NLP/llm2vec | |
- **Paper:** https://arxiv.org/abs/2404.05961 | |
## Installation | |
```bash | |
pip install llm2vec | |
``` | |
## Usage | |
```python | |
from llm2vec import LLM2Vec | |
import torch | |
from transformers import AutoTokenizer, AutoModel, AutoConfig | |
from peft import PeftModel | |
# Loading base Mistral model, along with custom code that enables bidirectional connections in decoder-only LLMs. MNTP LoRA weights are merged into the base model. | |
tokenizer = AutoTokenizer.from_pretrained( | |
"McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp" | |
) | |
config = AutoConfig.from_pretrained( | |
"McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp", trust_remote_code=True | |
) | |
model = AutoModel.from_pretrained( | |
"McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp", | |
trust_remote_code=True, | |
config=config, | |
torch_dtype=torch.bfloat16, | |
device_map="cuda" if torch.cuda.is_available() else "cpu", | |
) | |
model = PeftModel.from_pretrained( | |
model, | |
"McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp", | |
) | |
model = model.merge_and_unload() # This can take several minutes on cpu | |
# Loading unsupervised SimCSE model. This loads the trained LoRA weights on top of MNTP model. Hence the final weights are -- Base model + MNTP (LoRA) + SimCSE (LoRA). | |
model = PeftModel.from_pretrained( | |
model, "McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp-unsup-simcse" | |
) | |
# Wrapper for encoding and pooling operations | |
l2v = LLM2Vec(model, tokenizer, pooling_mode="mean", max_length=512) | |
# Encoding queries using instructions | |
instruction = ( | |
"Given a web search query, retrieve relevant passages that answer the query:" | |
) | |
queries = [ | |
[instruction, "how much protein should a female eat"], | |
[instruction, "summit define"], | |
] | |
q_reps = l2v.encode(queries) | |
# Encoding documents. Instruction are not required for documents | |
documents = [ | |
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.", | |
"Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments.", | |
] | |
d_reps = l2v.encode(documents) | |
# Compute cosine similarity | |
q_reps_norm = torch.nn.functional.normalize(q_reps, p=2, dim=1) | |
d_reps_norm = torch.nn.functional.normalize(d_reps, p=2, dim=1) | |
cos_sim = torch.mm(q_reps_norm, d_reps_norm.transpose(0, 1)) | |
print(cos_sim) | |
""" | |
tensor([[0.5964, 0.1270], | |
[0.0698, 0.2394]]) | |
""" | |
``` | |
## Questions | |
If you have any question about the code, feel free to email Parishad (`parishad.behnamghader@mila.quebec`) and Vaibhav (`vaibhav.adlakha@mila.quebec`). |