Sentence Similarity
PEFT
Safetensors
English
text-embedding
embeddings
information-retrieval
beir
text-classification
language-model
text-clustering
text-semantic-similarity
text-evaluation
text-reranking
feature-extraction
Sentence Similarity
natural_questions
ms_marco
fever
hotpot_qa
mteb
Eval Results
library_name: peft | |
license: mit | |
language: | |
- en | |
pipeline_tag: sentence-similarity | |
tags: | |
- text-embedding | |
- embeddings | |
- information-retrieval | |
- beir | |
- text-classification | |
- language-model | |
- text-clustering | |
- text-semantic-similarity | |
- text-evaluation | |
- text-reranking | |
- feature-extraction | |
- sentence-similarity | |
- Sentence Similarity | |
- natural_questions | |
- ms_marco | |
- fever | |
- hotpot_qa | |
- mteb | |
model-index: | |
- name: LLM2Vec-Llama-2-unsupervised | |
results: | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_counterfactual | |
name: MTEB AmazonCounterfactualClassification (en) | |
config: en | |
split: test | |
revision: e8379541af4e31359cca9fbcf4b00f2671dba205 | |
metrics: | |
- type: accuracy | |
value: 76.91044776119402 | |
- type: ap | |
value: 41.73039886859448 | |
- type: f1 | |
value: 71.49663106134554 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_polarity | |
name: MTEB AmazonPolarityClassification | |
config: default | |
split: test | |
revision: e2d317d38cd51312af73b3d32a06d1a08b442046 | |
metrics: | |
- type: accuracy | |
value: 79.0549 | |
- type: ap | |
value: 74.50419535911905 | |
- type: f1 | |
value: 78.87370110570745 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_reviews_multi | |
name: MTEB AmazonReviewsClassification (en) | |
config: en | |
split: test | |
revision: 1399c76144fd37290681b995c656ef9b2e06e26d | |
metrics: | |
- type: accuracy | |
value: 40.07999999999999 | |
- type: f1 | |
value: 39.74598250149754 | |
- task: | |
type: Retrieval | |
dataset: | |
type: arguana | |
name: MTEB ArguAna | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 22.973 | |
- type: map_at_10 | |
value: 38.217 | |
- type: map_at_100 | |
value: 39.247 | |
- type: map_at_1000 | |
value: 39.263 | |
- type: map_at_3 | |
value: 33.108 | |
- type: map_at_5 | |
value: 35.942 | |
- type: mrr_at_1 | |
value: 23.755000000000003 | |
- type: mrr_at_10 | |
value: 38.495000000000005 | |
- type: mrr_at_100 | |
value: 39.525 | |
- type: mrr_at_1000 | |
value: 39.541 | |
- type: mrr_at_3 | |
value: 33.333 | |
- type: mrr_at_5 | |
value: 36.221 | |
- type: ndcg_at_1 | |
value: 22.973 | |
- type: ndcg_at_10 | |
value: 47.093 | |
- type: ndcg_at_100 | |
value: 51.745 | |
- type: ndcg_at_1000 | |
value: 52.126 | |
- type: ndcg_at_3 | |
value: 36.473 | |
- type: ndcg_at_5 | |
value: 41.591 | |
- type: precision_at_1 | |
value: 22.973 | |
- type: precision_at_10 | |
value: 7.568 | |
- type: precision_at_100 | |
value: 0.966 | |
- type: precision_at_1000 | |
value: 0.1 | |
- type: precision_at_3 | |
value: 15.409999999999998 | |
- type: precision_at_5 | |
value: 11.735 | |
- type: recall_at_1 | |
value: 22.973 | |
- type: recall_at_10 | |
value: 75.676 | |
- type: recall_at_100 | |
value: 96.586 | |
- type: recall_at_1000 | |
value: 99.502 | |
- type: recall_at_3 | |
value: 46.23 | |
- type: recall_at_5 | |
value: 58.677 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-p2p | |
name: MTEB ArxivClusteringP2P | |
config: default | |
split: test | |
revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d | |
metrics: | |
- type: v_measure | |
value: 47.808566636089296 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/arxiv-clustering-s2s | |
name: MTEB ArxivClusteringS2S | |
config: default | |
split: test | |
revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53 | |
metrics: | |
- type: v_measure | |
value: 40.53253525071289 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/askubuntudupquestions-reranking | |
name: MTEB AskUbuntuDupQuestions | |
config: default | |
split: test | |
revision: 2000358ca161889fa9c082cb41daa8dcfb161a54 | |
metrics: | |
- type: map | |
value: 55.564312661366564 | |
- type: mrr | |
value: 69.24526227850326 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/biosses-sts | |
name: MTEB BIOSSES | |
config: default | |
split: test | |
revision: d3fb88f8f02e40887cd149695127462bbcf29b4a | |
metrics: | |
- type: cos_sim_spearman | |
value: 82.40790181633206 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/banking77 | |
name: MTEB Banking77Classification | |
config: default | |
split: test | |
revision: 0fd18e25b25c072e09e0d92ab615fda904d66300 | |
metrics: | |
- type: accuracy | |
value: 84.64935064935064 | |
- type: f1 | |
value: 84.59305945931867 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-p2p | |
name: MTEB BiorxivClusteringP2P | |
config: default | |
split: test | |
revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40 | |
metrics: | |
- type: v_measure | |
value: 38.11916694447953 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/biorxiv-clustering-s2s | |
name: MTEB BiorxivClusteringS2S | |
config: default | |
split: test | |
revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908 | |
metrics: | |
- type: v_measure | |
value: 31.248648913887024 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/android | |
name: MTEB CQADupstackAndroidRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 24.483 | |
- type: map_at_10 | |
value: 34.549 | |
- type: map_at_100 | |
value: 36.106 | |
- type: map_at_1000 | |
value: 36.253 | |
- type: map_at_3 | |
value: 31.313999999999997 | |
- type: map_at_5 | |
value: 32.987 | |
- type: mrr_at_1 | |
value: 32.046 | |
- type: mrr_at_10 | |
value: 41.217999999999996 | |
- type: mrr_at_100 | |
value: 42.068 | |
- type: mrr_at_1000 | |
value: 42.126999999999995 | |
- type: mrr_at_3 | |
value: 38.746 | |
- type: mrr_at_5 | |
value: 40.083 | |
- type: ndcg_at_1 | |
value: 32.046 | |
- type: ndcg_at_10 | |
value: 40.927 | |
- type: ndcg_at_100 | |
value: 46.5 | |
- type: ndcg_at_1000 | |
value: 49.043 | |
- type: ndcg_at_3 | |
value: 36.448 | |
- type: ndcg_at_5 | |
value: 38.199 | |
- type: precision_at_1 | |
value: 32.046 | |
- type: precision_at_10 | |
value: 8.484 | |
- type: precision_at_100 | |
value: 1.443 | |
- type: precision_at_1000 | |
value: 0.2 | |
- type: precision_at_3 | |
value: 18.407 | |
- type: precision_at_5 | |
value: 13.419 | |
- type: recall_at_1 | |
value: 24.483 | |
- type: recall_at_10 | |
value: 51.946999999999996 | |
- type: recall_at_100 | |
value: 75.842 | |
- type: recall_at_1000 | |
value: 93.368 | |
- type: recall_at_3 | |
value: 38.023 | |
- type: recall_at_5 | |
value: 43.356 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/english | |
name: MTEB CQADupstackEnglishRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 27.090999999999998 | |
- type: map_at_10 | |
value: 36.106 | |
- type: map_at_100 | |
value: 37.188 | |
- type: map_at_1000 | |
value: 37.32 | |
- type: map_at_3 | |
value: 33.293 | |
- type: map_at_5 | |
value: 34.755 | |
- type: mrr_at_1 | |
value: 35.86 | |
- type: mrr_at_10 | |
value: 42.979 | |
- type: mrr_at_100 | |
value: 43.619 | |
- type: mrr_at_1000 | |
value: 43.669999999999995 | |
- type: mrr_at_3 | |
value: 40.849000000000004 | |
- type: mrr_at_5 | |
value: 41.964 | |
- type: ndcg_at_1 | |
value: 35.86 | |
- type: ndcg_at_10 | |
value: 41.676 | |
- type: ndcg_at_100 | |
value: 45.678000000000004 | |
- type: ndcg_at_1000 | |
value: 47.99 | |
- type: ndcg_at_3 | |
value: 37.862 | |
- type: ndcg_at_5 | |
value: 39.342 | |
- type: precision_at_1 | |
value: 35.86 | |
- type: precision_at_10 | |
value: 8.178 | |
- type: precision_at_100 | |
value: 1.308 | |
- type: precision_at_1000 | |
value: 0.182 | |
- type: precision_at_3 | |
value: 18.662 | |
- type: precision_at_5 | |
value: 13.172 | |
- type: recall_at_1 | |
value: 27.090999999999998 | |
- type: recall_at_10 | |
value: 50.407999999999994 | |
- type: recall_at_100 | |
value: 68.27499999999999 | |
- type: recall_at_1000 | |
value: 83.155 | |
- type: recall_at_3 | |
value: 38.259 | |
- type: recall_at_5 | |
value: 43.096000000000004 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gaming | |
name: MTEB CQADupstackGamingRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 32.01 | |
- type: map_at_10 | |
value: 42.915 | |
- type: map_at_100 | |
value: 44.096000000000004 | |
- type: map_at_1000 | |
value: 44.175 | |
- type: map_at_3 | |
value: 40.283 | |
- type: map_at_5 | |
value: 41.744 | |
- type: mrr_at_1 | |
value: 37.68 | |
- type: mrr_at_10 | |
value: 46.929 | |
- type: mrr_at_100 | |
value: 47.75 | |
- type: mrr_at_1000 | |
value: 47.795 | |
- type: mrr_at_3 | |
value: 44.713 | |
- type: mrr_at_5 | |
value: 45.885 | |
- type: ndcg_at_1 | |
value: 37.68 | |
- type: ndcg_at_10 | |
value: 48.453 | |
- type: ndcg_at_100 | |
value: 53.494 | |
- type: ndcg_at_1000 | |
value: 55.169000000000004 | |
- type: ndcg_at_3 | |
value: 43.834 | |
- type: ndcg_at_5 | |
value: 45.926 | |
- type: precision_at_1 | |
value: 37.68 | |
- type: precision_at_10 | |
value: 7.906000000000001 | |
- type: precision_at_100 | |
value: 1.135 | |
- type: precision_at_1000 | |
value: 0.134 | |
- type: precision_at_3 | |
value: 20.041999999999998 | |
- type: precision_at_5 | |
value: 13.58 | |
- type: recall_at_1 | |
value: 32.01 | |
- type: recall_at_10 | |
value: 61.049 | |
- type: recall_at_100 | |
value: 83.182 | |
- type: recall_at_1000 | |
value: 95.279 | |
- type: recall_at_3 | |
value: 48.407 | |
- type: recall_at_5 | |
value: 53.748 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/gis | |
name: MTEB CQADupstackGisRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 14.511 | |
- type: map_at_10 | |
value: 20.305999999999997 | |
- type: map_at_100 | |
value: 21.307000000000002 | |
- type: map_at_1000 | |
value: 21.419 | |
- type: map_at_3 | |
value: 18.376 | |
- type: map_at_5 | |
value: 19.421 | |
- type: mrr_at_1 | |
value: 16.045 | |
- type: mrr_at_10 | |
value: 22.002 | |
- type: mrr_at_100 | |
value: 22.986 | |
- type: mrr_at_1000 | |
value: 23.071 | |
- type: mrr_at_3 | |
value: 20.264 | |
- type: mrr_at_5 | |
value: 21.173000000000002 | |
- type: ndcg_at_1 | |
value: 16.045 | |
- type: ndcg_at_10 | |
value: 23.953 | |
- type: ndcg_at_100 | |
value: 29.201 | |
- type: ndcg_at_1000 | |
value: 32.366 | |
- type: ndcg_at_3 | |
value: 20.136000000000003 | |
- type: ndcg_at_5 | |
value: 21.859 | |
- type: precision_at_1 | |
value: 16.045 | |
- type: precision_at_10 | |
value: 3.8760000000000003 | |
- type: precision_at_100 | |
value: 0.696 | |
- type: precision_at_1000 | |
value: 0.101 | |
- type: precision_at_3 | |
value: 8.776 | |
- type: precision_at_5 | |
value: 6.282 | |
- type: recall_at_1 | |
value: 14.511 | |
- type: recall_at_10 | |
value: 33.707 | |
- type: recall_at_100 | |
value: 58.182 | |
- type: recall_at_1000 | |
value: 82.845 | |
- type: recall_at_3 | |
value: 23.206 | |
- type: recall_at_5 | |
value: 27.311999999999998 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/mathematica | |
name: MTEB CQADupstackMathematicaRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 9.762 | |
- type: map_at_10 | |
value: 15.495000000000001 | |
- type: map_at_100 | |
value: 16.637 | |
- type: map_at_1000 | |
value: 16.786 | |
- type: map_at_3 | |
value: 13.62 | |
- type: map_at_5 | |
value: 14.655999999999999 | |
- type: mrr_at_1 | |
value: 12.934999999999999 | |
- type: mrr_at_10 | |
value: 18.985 | |
- type: mrr_at_100 | |
value: 20.079 | |
- type: mrr_at_1000 | |
value: 20.177999999999997 | |
- type: mrr_at_3 | |
value: 16.977999999999998 | |
- type: mrr_at_5 | |
value: 18.197 | |
- type: ndcg_at_1 | |
value: 12.934999999999999 | |
- type: ndcg_at_10 | |
value: 19.444 | |
- type: ndcg_at_100 | |
value: 25.108999999999998 | |
- type: ndcg_at_1000 | |
value: 28.804999999999996 | |
- type: ndcg_at_3 | |
value: 15.93 | |
- type: ndcg_at_5 | |
value: 17.57 | |
- type: precision_at_1 | |
value: 12.934999999999999 | |
- type: precision_at_10 | |
value: 3.856 | |
- type: precision_at_100 | |
value: 0.765 | |
- type: precision_at_1000 | |
value: 0.124 | |
- type: precision_at_3 | |
value: 8.043 | |
- type: precision_at_5 | |
value: 6.095 | |
- type: recall_at_1 | |
value: 9.762 | |
- type: recall_at_10 | |
value: 28.216 | |
- type: recall_at_100 | |
value: 53.28000000000001 | |
- type: recall_at_1000 | |
value: 79.64099999999999 | |
- type: recall_at_3 | |
value: 18.335 | |
- type: recall_at_5 | |
value: 22.435 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/physics | |
name: MTEB CQADupstackPhysicsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 21.770999999999997 | |
- type: map_at_10 | |
value: 30.837999999999997 | |
- type: map_at_100 | |
value: 32.327 | |
- type: map_at_1000 | |
value: 32.464999999999996 | |
- type: map_at_3 | |
value: 27.891 | |
- type: map_at_5 | |
value: 29.433 | |
- type: mrr_at_1 | |
value: 27.622999999999998 | |
- type: mrr_at_10 | |
value: 36.293 | |
- type: mrr_at_100 | |
value: 37.221 | |
- type: mrr_at_1000 | |
value: 37.288 | |
- type: mrr_at_3 | |
value: 33.574 | |
- type: mrr_at_5 | |
value: 35.085 | |
- type: ndcg_at_1 | |
value: 27.622999999999998 | |
- type: ndcg_at_10 | |
value: 36.784 | |
- type: ndcg_at_100 | |
value: 43.033 | |
- type: ndcg_at_1000 | |
value: 45.616 | |
- type: ndcg_at_3 | |
value: 31.694 | |
- type: ndcg_at_5 | |
value: 33.909 | |
- type: precision_at_1 | |
value: 27.622999999999998 | |
- type: precision_at_10 | |
value: 7.141 | |
- type: precision_at_100 | |
value: 1.24 | |
- type: precision_at_1000 | |
value: 0.165 | |
- type: precision_at_3 | |
value: 15.623999999999999 | |
- type: precision_at_5 | |
value: 11.338 | |
- type: recall_at_1 | |
value: 21.770999999999997 | |
- type: recall_at_10 | |
value: 49.318 | |
- type: recall_at_100 | |
value: 75.779 | |
- type: recall_at_1000 | |
value: 92.729 | |
- type: recall_at_3 | |
value: 34.685 | |
- type: recall_at_5 | |
value: 40.546 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/programmers | |
name: MTEB CQADupstackProgrammersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 20.156 | |
- type: map_at_10 | |
value: 27.732 | |
- type: map_at_100 | |
value: 29.002 | |
- type: map_at_1000 | |
value: 29.149 | |
- type: map_at_3 | |
value: 25.044 | |
- type: map_at_5 | |
value: 26.586 | |
- type: mrr_at_1 | |
value: 25.457 | |
- type: mrr_at_10 | |
value: 32.799 | |
- type: mrr_at_100 | |
value: 33.756 | |
- type: mrr_at_1000 | |
value: 33.833 | |
- type: mrr_at_3 | |
value: 30.497999999999998 | |
- type: mrr_at_5 | |
value: 31.857000000000003 | |
- type: ndcg_at_1 | |
value: 25.457 | |
- type: ndcg_at_10 | |
value: 32.59 | |
- type: ndcg_at_100 | |
value: 38.336 | |
- type: ndcg_at_1000 | |
value: 41.475 | |
- type: ndcg_at_3 | |
value: 28.166000000000004 | |
- type: ndcg_at_5 | |
value: 30.281000000000002 | |
- type: precision_at_1 | |
value: 25.457 | |
- type: precision_at_10 | |
value: 6.062 | |
- type: precision_at_100 | |
value: 1.083 | |
- type: precision_at_1000 | |
value: 0.156 | |
- type: precision_at_3 | |
value: 13.661000000000001 | |
- type: precision_at_5 | |
value: 9.886000000000001 | |
- type: recall_at_1 | |
value: 20.156 | |
- type: recall_at_10 | |
value: 42.191 | |
- type: recall_at_100 | |
value: 66.953 | |
- type: recall_at_1000 | |
value: 88.91 | |
- type: recall_at_3 | |
value: 29.86 | |
- type: recall_at_5 | |
value: 35.553000000000004 | |
- task: | |
type: Retrieval | |
dataset: | |
type: mteb/cqadupstack | |
name: MTEB CQADupstackRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 18.901250000000005 | |
- type: map_at_10 | |
value: 26.13458333333333 | |
- type: map_at_100 | |
value: 27.282833333333333 | |
- type: map_at_1000 | |
value: 27.416749999999997 | |
- type: map_at_3 | |
value: 23.753500000000003 | |
- type: map_at_5 | |
value: 25.076833333333337 | |
- type: mrr_at_1 | |
value: 23.560500000000005 | |
- type: mrr_at_10 | |
value: 30.31466666666666 | |
- type: mrr_at_100 | |
value: 31.217249999999996 | |
- type: mrr_at_1000 | |
value: 31.29225 | |
- type: mrr_at_3 | |
value: 28.16208333333333 | |
- type: mrr_at_5 | |
value: 29.39025 | |
- type: ndcg_at_1 | |
value: 23.560500000000005 | |
- type: ndcg_at_10 | |
value: 30.780500000000004 | |
- type: ndcg_at_100 | |
value: 36.003083333333336 | |
- type: ndcg_at_1000 | |
value: 38.918166666666664 | |
- type: ndcg_at_3 | |
value: 26.735249999999994 | |
- type: ndcg_at_5 | |
value: 28.60558333333333 | |
- type: precision_at_1 | |
value: 23.560500000000005 | |
- type: precision_at_10 | |
value: 5.700583333333334 | |
- type: precision_at_100 | |
value: 1.0015 | |
- type: precision_at_1000 | |
value: 0.14475 | |
- type: precision_at_3 | |
value: 12.736749999999999 | |
- type: precision_at_5 | |
value: 9.230666666666666 | |
- type: recall_at_1 | |
value: 18.901250000000005 | |
- type: recall_at_10 | |
value: 40.4075 | |
- type: recall_at_100 | |
value: 63.96683333333333 | |
- type: recall_at_1000 | |
value: 84.86883333333333 | |
- type: recall_at_3 | |
value: 28.79183333333334 | |
- type: recall_at_5 | |
value: 33.7335 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/stats | |
name: MTEB CQADupstackStatsRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 15.977 | |
- type: map_at_10 | |
value: 21.612000000000002 | |
- type: map_at_100 | |
value: 22.519 | |
- type: map_at_1000 | |
value: 22.633 | |
- type: map_at_3 | |
value: 19.766000000000002 | |
- type: map_at_5 | |
value: 20.855999999999998 | |
- type: mrr_at_1 | |
value: 19.017999999999997 | |
- type: mrr_at_10 | |
value: 24.310000000000002 | |
- type: mrr_at_100 | |
value: 25.206 | |
- type: mrr_at_1000 | |
value: 25.295 | |
- type: mrr_at_3 | |
value: 22.52 | |
- type: mrr_at_5 | |
value: 23.586 | |
- type: ndcg_at_1 | |
value: 19.017999999999997 | |
- type: ndcg_at_10 | |
value: 25.024 | |
- type: ndcg_at_100 | |
value: 29.942999999999998 | |
- type: ndcg_at_1000 | |
value: 33.059 | |
- type: ndcg_at_3 | |
value: 21.654 | |
- type: ndcg_at_5 | |
value: 23.347 | |
- type: precision_at_1 | |
value: 19.017999999999997 | |
- type: precision_at_10 | |
value: 4.1259999999999994 | |
- type: precision_at_100 | |
value: 0.719 | |
- type: precision_at_1000 | |
value: 0.106 | |
- type: precision_at_3 | |
value: 9.714 | |
- type: precision_at_5 | |
value: 7.025 | |
- type: recall_at_1 | |
value: 15.977 | |
- type: recall_at_10 | |
value: 33.012 | |
- type: recall_at_100 | |
value: 56.201 | |
- type: recall_at_1000 | |
value: 79.837 | |
- type: recall_at_3 | |
value: 23.551 | |
- type: recall_at_5 | |
value: 27.733 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/tex | |
name: MTEB CQADupstackTexRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 10.26 | |
- type: map_at_10 | |
value: 14.248 | |
- type: map_at_100 | |
value: 15.095 | |
- type: map_at_1000 | |
value: 15.22 | |
- type: map_at_3 | |
value: 12.7 | |
- type: map_at_5 | |
value: 13.492999999999999 | |
- type: mrr_at_1 | |
value: 13.73 | |
- type: mrr_at_10 | |
value: 17.964 | |
- type: mrr_at_100 | |
value: 18.748 | |
- type: mrr_at_1000 | |
value: 18.842 | |
- type: mrr_at_3 | |
value: 16.34 | |
- type: mrr_at_5 | |
value: 17.205000000000002 | |
- type: ndcg_at_1 | |
value: 13.73 | |
- type: ndcg_at_10 | |
value: 17.429 | |
- type: ndcg_at_100 | |
value: 21.856 | |
- type: ndcg_at_1000 | |
value: 25.251 | |
- type: ndcg_at_3 | |
value: 14.667 | |
- type: ndcg_at_5 | |
value: 15.790000000000001 | |
- type: precision_at_1 | |
value: 13.73 | |
- type: precision_at_10 | |
value: 3.4099999999999997 | |
- type: precision_at_100 | |
value: 0.6839999999999999 | |
- type: precision_at_1000 | |
value: 0.11399999999999999 | |
- type: precision_at_3 | |
value: 7.202999999999999 | |
- type: precision_at_5 | |
value: 5.299 | |
- type: recall_at_1 | |
value: 10.26 | |
- type: recall_at_10 | |
value: 23.54 | |
- type: recall_at_100 | |
value: 44.085 | |
- type: recall_at_1000 | |
value: 69.233 | |
- type: recall_at_3 | |
value: 15.387999999999998 | |
- type: recall_at_5 | |
value: 18.467 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/unix | |
name: MTEB CQADupstackUnixRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 18.695 | |
- type: map_at_10 | |
value: 25.752000000000002 | |
- type: map_at_100 | |
value: 26.810000000000002 | |
- type: map_at_1000 | |
value: 26.931 | |
- type: map_at_3 | |
value: 23.205000000000002 | |
- type: map_at_5 | |
value: 24.792 | |
- type: mrr_at_1 | |
value: 23.134 | |
- type: mrr_at_10 | |
value: 30.176 | |
- type: mrr_at_100 | |
value: 31.087999999999997 | |
- type: mrr_at_1000 | |
value: 31.162 | |
- type: mrr_at_3 | |
value: 27.766999999999996 | |
- type: mrr_at_5 | |
value: 29.321 | |
- type: ndcg_at_1 | |
value: 23.134 | |
- type: ndcg_at_10 | |
value: 30.427 | |
- type: ndcg_at_100 | |
value: 35.839999999999996 | |
- type: ndcg_at_1000 | |
value: 38.675 | |
- type: ndcg_at_3 | |
value: 25.959 | |
- type: ndcg_at_5 | |
value: 28.364 | |
- type: precision_at_1 | |
value: 23.134 | |
- type: precision_at_10 | |
value: 5.466 | |
- type: precision_at_100 | |
value: 0.9259999999999999 | |
- type: precision_at_1000 | |
value: 0.128 | |
- type: precision_at_3 | |
value: 12.127 | |
- type: precision_at_5 | |
value: 8.993 | |
- type: recall_at_1 | |
value: 18.695 | |
- type: recall_at_10 | |
value: 40.345 | |
- type: recall_at_100 | |
value: 65.009 | |
- type: recall_at_1000 | |
value: 85.368 | |
- type: recall_at_3 | |
value: 28.016999999999996 | |
- type: recall_at_5 | |
value: 34.211999999999996 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/webmasters | |
name: MTEB CQADupstackWebmastersRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 19.955000000000002 | |
- type: map_at_10 | |
value: 26.924999999999997 | |
- type: map_at_100 | |
value: 28.260999999999996 | |
- type: map_at_1000 | |
value: 28.499999999999996 | |
- type: map_at_3 | |
value: 24.282 | |
- type: map_at_5 | |
value: 25.89 | |
- type: mrr_at_1 | |
value: 25.889 | |
- type: mrr_at_10 | |
value: 31.596999999999998 | |
- type: mrr_at_100 | |
value: 32.631 | |
- type: mrr_at_1000 | |
value: 32.702999999999996 | |
- type: mrr_at_3 | |
value: 29.182999999999996 | |
- type: mrr_at_5 | |
value: 30.705 | |
- type: ndcg_at_1 | |
value: 25.889 | |
- type: ndcg_at_10 | |
value: 32.094 | |
- type: ndcg_at_100 | |
value: 37.39 | |
- type: ndcg_at_1000 | |
value: 40.923 | |
- type: ndcg_at_3 | |
value: 27.815 | |
- type: ndcg_at_5 | |
value: 30.162 | |
- type: precision_at_1 | |
value: 25.889 | |
- type: precision_at_10 | |
value: 6.482 | |
- type: precision_at_100 | |
value: 1.374 | |
- type: precision_at_1000 | |
value: 0.231 | |
- type: precision_at_3 | |
value: 13.373 | |
- type: precision_at_5 | |
value: 10.356 | |
- type: recall_at_1 | |
value: 19.955000000000002 | |
- type: recall_at_10 | |
value: 41.157 | |
- type: recall_at_100 | |
value: 66.518 | |
- type: recall_at_1000 | |
value: 90.814 | |
- type: recall_at_3 | |
value: 28.319 | |
- type: recall_at_5 | |
value: 34.394999999999996 | |
- task: | |
type: Retrieval | |
dataset: | |
type: cqadupstack/wordpress | |
name: MTEB CQADupstackWordpressRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 12.144 | |
- type: map_at_10 | |
value: 17.137 | |
- type: map_at_100 | |
value: 18.046 | |
- type: map_at_1000 | |
value: 18.15 | |
- type: map_at_3 | |
value: 15.268 | |
- type: map_at_5 | |
value: 16.309 | |
- type: mrr_at_1 | |
value: 13.309000000000001 | |
- type: mrr_at_10 | |
value: 18.523999999999997 | |
- type: mrr_at_100 | |
value: 19.455 | |
- type: mrr_at_1000 | |
value: 19.543 | |
- type: mrr_at_3 | |
value: 16.512999999999998 | |
- type: mrr_at_5 | |
value: 17.622 | |
- type: ndcg_at_1 | |
value: 13.309000000000001 | |
- type: ndcg_at_10 | |
value: 20.565 | |
- type: ndcg_at_100 | |
value: 25.657000000000004 | |
- type: ndcg_at_1000 | |
value: 28.646 | |
- type: ndcg_at_3 | |
value: 16.658 | |
- type: ndcg_at_5 | |
value: 18.518 | |
- type: precision_at_1 | |
value: 13.309000000000001 | |
- type: precision_at_10 | |
value: 3.42 | |
- type: precision_at_100 | |
value: 0.645 | |
- type: precision_at_1000 | |
value: 0.096 | |
- type: precision_at_3 | |
value: 7.2090000000000005 | |
- type: precision_at_5 | |
value: 5.323 | |
- type: recall_at_1 | |
value: 12.144 | |
- type: recall_at_10 | |
value: 30.0 | |
- type: recall_at_100 | |
value: 54.296 | |
- type: recall_at_1000 | |
value: 77.247 | |
- type: recall_at_3 | |
value: 19.451999999999998 | |
- type: recall_at_5 | |
value: 23.949 | |
- task: | |
type: Retrieval | |
dataset: | |
type: climate-fever | |
name: MTEB ClimateFEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 7.531000000000001 | |
- type: map_at_10 | |
value: 13.875000000000002 | |
- type: map_at_100 | |
value: 15.714 | |
- type: map_at_1000 | |
value: 15.934999999999999 | |
- type: map_at_3 | |
value: 11.204 | |
- type: map_at_5 | |
value: 12.373000000000001 | |
- type: mrr_at_1 | |
value: 16.547 | |
- type: mrr_at_10 | |
value: 26.889000000000003 | |
- type: mrr_at_100 | |
value: 28.194999999999997 | |
- type: mrr_at_1000 | |
value: 28.242 | |
- type: mrr_at_3 | |
value: 23.279 | |
- type: mrr_at_5 | |
value: 25.289 | |
- type: ndcg_at_1 | |
value: 16.547 | |
- type: ndcg_at_10 | |
value: 20.666999999999998 | |
- type: ndcg_at_100 | |
value: 28.896 | |
- type: ndcg_at_1000 | |
value: 32.843 | |
- type: ndcg_at_3 | |
value: 15.598999999999998 | |
- type: ndcg_at_5 | |
value: 17.238 | |
- type: precision_at_1 | |
value: 16.547 | |
- type: precision_at_10 | |
value: 6.958 | |
- type: precision_at_100 | |
value: 1.5810000000000002 | |
- type: precision_at_1000 | |
value: 0.231 | |
- type: precision_at_3 | |
value: 11.726 | |
- type: precision_at_5 | |
value: 9.472 | |
- type: recall_at_1 | |
value: 7.531000000000001 | |
- type: recall_at_10 | |
value: 26.726 | |
- type: recall_at_100 | |
value: 55.940999999999995 | |
- type: recall_at_1000 | |
value: 78.119 | |
- type: recall_at_3 | |
value: 14.815000000000001 | |
- type: recall_at_5 | |
value: 18.955 | |
- task: | |
type: Retrieval | |
dataset: | |
type: dbpedia-entity | |
name: MTEB DBPedia | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 4.741 | |
- type: map_at_10 | |
value: 11.743 | |
- type: map_at_100 | |
value: 16.723 | |
- type: map_at_1000 | |
value: 17.813000000000002 | |
- type: map_at_3 | |
value: 8.017000000000001 | |
- type: map_at_5 | |
value: 9.655 | |
- type: mrr_at_1 | |
value: 40.25 | |
- type: mrr_at_10 | |
value: 52.244 | |
- type: mrr_at_100 | |
value: 52.933 | |
- type: mrr_at_1000 | |
value: 52.957 | |
- type: mrr_at_3 | |
value: 49.791999999999994 | |
- type: mrr_at_5 | |
value: 51.629000000000005 | |
- type: ndcg_at_1 | |
value: 30.0 | |
- type: ndcg_at_10 | |
value: 25.813000000000002 | |
- type: ndcg_at_100 | |
value: 31.075999999999997 | |
- type: ndcg_at_1000 | |
value: 38.242 | |
- type: ndcg_at_3 | |
value: 27.394000000000002 | |
- type: ndcg_at_5 | |
value: 26.395999999999997 | |
- type: precision_at_1 | |
value: 40.25 | |
- type: precision_at_10 | |
value: 22.0 | |
- type: precision_at_100 | |
value: 7.077999999999999 | |
- type: precision_at_1000 | |
value: 1.492 | |
- type: precision_at_3 | |
value: 32.833 | |
- type: precision_at_5 | |
value: 28.15 | |
- type: recall_at_1 | |
value: 4.741 | |
- type: recall_at_10 | |
value: 18.11 | |
- type: recall_at_100 | |
value: 40.617999999999995 | |
- type: recall_at_1000 | |
value: 63.92 | |
- type: recall_at_3 | |
value: 9.724 | |
- type: recall_at_5 | |
value: 13.333 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/emotion | |
name: MTEB EmotionClassification | |
config: default | |
split: test | |
revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37 | |
metrics: | |
- type: accuracy | |
value: 46.575 | |
- type: f1 | |
value: 42.15253766150754 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fever | |
name: MTEB FEVER | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 24.676000000000002 | |
- type: map_at_10 | |
value: 36.666 | |
- type: map_at_100 | |
value: 37.613 | |
- type: map_at_1000 | |
value: 37.663000000000004 | |
- type: map_at_3 | |
value: 33.269999999999996 | |
- type: map_at_5 | |
value: 35.21 | |
- type: mrr_at_1 | |
value: 26.733 | |
- type: mrr_at_10 | |
value: 39.007999999999996 | |
- type: mrr_at_100 | |
value: 39.904 | |
- type: mrr_at_1000 | |
value: 39.944 | |
- type: mrr_at_3 | |
value: 35.591 | |
- type: mrr_at_5 | |
value: 37.544 | |
- type: ndcg_at_1 | |
value: 26.733 | |
- type: ndcg_at_10 | |
value: 43.477 | |
- type: ndcg_at_100 | |
value: 47.906 | |
- type: ndcg_at_1000 | |
value: 49.144 | |
- type: ndcg_at_3 | |
value: 36.606 | |
- type: ndcg_at_5 | |
value: 40.009 | |
- type: precision_at_1 | |
value: 26.733 | |
- type: precision_at_10 | |
value: 6.842 | |
- type: precision_at_100 | |
value: 0.9209999999999999 | |
- type: precision_at_1000 | |
value: 0.104 | |
- type: precision_at_3 | |
value: 15.906999999999998 | |
- type: precision_at_5 | |
value: 11.356 | |
- type: recall_at_1 | |
value: 24.676000000000002 | |
- type: recall_at_10 | |
value: 62.556999999999995 | |
- type: recall_at_100 | |
value: 82.43 | |
- type: recall_at_1000 | |
value: 91.738 | |
- type: recall_at_3 | |
value: 43.885000000000005 | |
- type: recall_at_5 | |
value: 52.054 | |
- task: | |
type: Retrieval | |
dataset: | |
type: fiqa | |
name: MTEB FiQA2018 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 11.101999999999999 | |
- type: map_at_10 | |
value: 18.490000000000002 | |
- type: map_at_100 | |
value: 20.404 | |
- type: map_at_1000 | |
value: 20.631 | |
- type: map_at_3 | |
value: 15.6 | |
- type: map_at_5 | |
value: 17.169 | |
- type: mrr_at_1 | |
value: 22.531000000000002 | |
- type: mrr_at_10 | |
value: 30.429000000000002 | |
- type: mrr_at_100 | |
value: 31.537 | |
- type: mrr_at_1000 | |
value: 31.606 | |
- type: mrr_at_3 | |
value: 27.546 | |
- type: mrr_at_5 | |
value: 29.159000000000002 | |
- type: ndcg_at_1 | |
value: 22.531000000000002 | |
- type: ndcg_at_10 | |
value: 24.624 | |
- type: ndcg_at_100 | |
value: 32.836 | |
- type: ndcg_at_1000 | |
value: 36.992000000000004 | |
- type: ndcg_at_3 | |
value: 20.806 | |
- type: ndcg_at_5 | |
value: 22.292 | |
- type: precision_at_1 | |
value: 22.531000000000002 | |
- type: precision_at_10 | |
value: 7.176 | |
- type: precision_at_100 | |
value: 1.546 | |
- type: precision_at_1000 | |
value: 0.22799999999999998 | |
- type: precision_at_3 | |
value: 14.198 | |
- type: precision_at_5 | |
value: 11.019 | |
- type: recall_at_1 | |
value: 11.101999999999999 | |
- type: recall_at_10 | |
value: 30.86 | |
- type: recall_at_100 | |
value: 62.564 | |
- type: recall_at_1000 | |
value: 87.627 | |
- type: recall_at_3 | |
value: 18.721 | |
- type: recall_at_5 | |
value: 23.830000000000002 | |
- task: | |
type: Retrieval | |
dataset: | |
type: hotpotqa | |
name: MTEB HotpotQA | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 27.474999999999998 | |
- type: map_at_10 | |
value: 39.342 | |
- type: map_at_100 | |
value: 40.458 | |
- type: map_at_1000 | |
value: 40.553 | |
- type: map_at_3 | |
value: 36.272999999999996 | |
- type: map_at_5 | |
value: 38.091 | |
- type: mrr_at_1 | |
value: 54.949000000000005 | |
- type: mrr_at_10 | |
value: 63.28 | |
- type: mrr_at_100 | |
value: 63.796 | |
- type: mrr_at_1000 | |
value: 63.821000000000005 | |
- type: mrr_at_3 | |
value: 61.41799999999999 | |
- type: mrr_at_5 | |
value: 62.522999999999996 | |
- type: ndcg_at_1 | |
value: 54.949000000000005 | |
- type: ndcg_at_10 | |
value: 48.461 | |
- type: ndcg_at_100 | |
value: 52.903999999999996 | |
- type: ndcg_at_1000 | |
value: 54.906 | |
- type: ndcg_at_3 | |
value: 43.428 | |
- type: ndcg_at_5 | |
value: 46.045 | |
- type: precision_at_1 | |
value: 54.949000000000005 | |
- type: precision_at_10 | |
value: 10.446 | |
- type: precision_at_100 | |
value: 1.397 | |
- type: precision_at_1000 | |
value: 0.166 | |
- type: precision_at_3 | |
value: 27.310000000000002 | |
- type: precision_at_5 | |
value: 18.458 | |
- type: recall_at_1 | |
value: 27.474999999999998 | |
- type: recall_at_10 | |
value: 52.227999999999994 | |
- type: recall_at_100 | |
value: 69.838 | |
- type: recall_at_1000 | |
value: 83.153 | |
- type: recall_at_3 | |
value: 40.966 | |
- type: recall_at_5 | |
value: 46.144 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/imdb | |
name: MTEB ImdbClassification | |
config: default | |
split: test | |
revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7 | |
metrics: | |
- type: accuracy | |
value: 75.6784 | |
- type: ap | |
value: 70.03950630113135 | |
- type: f1 | |
value: 75.38669491280882 | |
- task: | |
type: Retrieval | |
dataset: | |
type: msmarco | |
name: MTEB MSMARCO | |
config: default | |
split: dev | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 8.182 | |
- type: map_at_10 | |
value: 14.597999999999999 | |
- type: map_at_100 | |
value: 15.795 | |
- type: map_at_1000 | |
value: 15.901000000000002 | |
- type: map_at_3 | |
value: 12.001000000000001 | |
- type: map_at_5 | |
value: 13.377 | |
- type: mrr_at_1 | |
value: 8.395 | |
- type: mrr_at_10 | |
value: 14.883 | |
- type: mrr_at_100 | |
value: 16.073999999999998 | |
- type: mrr_at_1000 | |
value: 16.174 | |
- type: mrr_at_3 | |
value: 12.267999999999999 | |
- type: mrr_at_5 | |
value: 13.658000000000001 | |
- type: ndcg_at_1 | |
value: 8.395 | |
- type: ndcg_at_10 | |
value: 18.81 | |
- type: ndcg_at_100 | |
value: 25.144 | |
- type: ndcg_at_1000 | |
value: 28.094 | |
- type: ndcg_at_3 | |
value: 13.366 | |
- type: ndcg_at_5 | |
value: 15.856 | |
- type: precision_at_1 | |
value: 8.395 | |
- type: precision_at_10 | |
value: 3.328 | |
- type: precision_at_100 | |
value: 0.657 | |
- type: precision_at_1000 | |
value: 0.091 | |
- type: precision_at_3 | |
value: 5.84 | |
- type: precision_at_5 | |
value: 4.765 | |
- type: recall_at_1 | |
value: 8.182 | |
- type: recall_at_10 | |
value: 32.151 | |
- type: recall_at_100 | |
value: 62.633 | |
- type: recall_at_1000 | |
value: 85.88 | |
- type: recall_at_3 | |
value: 17.069000000000003 | |
- type: recall_at_5 | |
value: 23.092 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_domain | |
name: MTEB MTOPDomainClassification (en) | |
config: en | |
split: test | |
revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf | |
metrics: | |
- type: accuracy | |
value: 94.3296853625171 | |
- type: f1 | |
value: 94.02246426051437 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/mtop_intent | |
name: MTEB MTOPIntentClassification (en) | |
config: en | |
split: test | |
revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba | |
metrics: | |
- type: accuracy | |
value: 79.54172366621067 | |
- type: f1 | |
value: 60.47715992221304 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_intent | |
name: MTEB MassiveIntentClassification (en) | |
config: en | |
split: test | |
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 | |
metrics: | |
- type: accuracy | |
value: 73.83994620040349 | |
- type: f1 | |
value: 70.84392062730345 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/amazon_massive_scenario | |
name: MTEB MassiveScenarioClassification (en) | |
config: en | |
split: test | |
revision: 7d571f92784cd94a019292a1f45445077d0ef634 | |
metrics: | |
- type: accuracy | |
value: 79.17283120376597 | |
- type: f1 | |
value: 78.83856078561683 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-p2p | |
name: MTEB MedrxivClusteringP2P | |
config: default | |
split: test | |
revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73 | |
metrics: | |
- type: v_measure | |
value: 30.939561146943344 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/medrxiv-clustering-s2s | |
name: MTEB MedrxivClusteringS2S | |
config: default | |
split: test | |
revision: 35191c8c0dca72d8ff3efcd72aa802307d469663 | |
metrics: | |
- type: v_measure | |
value: 28.0435406238161 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/mind_small | |
name: MTEB MindSmallReranking | |
config: default | |
split: test | |
revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69 | |
metrics: | |
- type: map | |
value: 30.860539801824743 | |
- type: mrr | |
value: 31.993223906232455 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nfcorpus | |
name: MTEB NFCorpus | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 2.6759999999999997 | |
- type: map_at_10 | |
value: 8.365 | |
- type: map_at_100 | |
value: 10.949 | |
- type: map_at_1000 | |
value: 12.248000000000001 | |
- type: map_at_3 | |
value: 5.836 | |
- type: map_at_5 | |
value: 7.094 | |
- type: mrr_at_1 | |
value: 32.507999999999996 | |
- type: mrr_at_10 | |
value: 43.336999999999996 | |
- type: mrr_at_100 | |
value: 44.092 | |
- type: mrr_at_1000 | |
value: 44.125 | |
- type: mrr_at_3 | |
value: 40.402 | |
- type: mrr_at_5 | |
value: 42.214 | |
- type: ndcg_at_1 | |
value: 30.186 | |
- type: ndcg_at_10 | |
value: 26.806 | |
- type: ndcg_at_100 | |
value: 25.446999999999996 | |
- type: ndcg_at_1000 | |
value: 34.33 | |
- type: ndcg_at_3 | |
value: 30.159999999999997 | |
- type: ndcg_at_5 | |
value: 28.671999999999997 | |
- type: precision_at_1 | |
value: 31.579 | |
- type: precision_at_10 | |
value: 20.96 | |
- type: precision_at_100 | |
value: 6.885 | |
- type: precision_at_1000 | |
value: 1.9560000000000002 | |
- type: precision_at_3 | |
value: 29.825000000000003 | |
- type: precision_at_5 | |
value: 25.944 | |
- type: recall_at_1 | |
value: 2.6759999999999997 | |
- type: recall_at_10 | |
value: 13.715 | |
- type: recall_at_100 | |
value: 29.246 | |
- type: recall_at_1000 | |
value: 59.878 | |
- type: recall_at_3 | |
value: 7.6850000000000005 | |
- type: recall_at_5 | |
value: 10.559000000000001 | |
- task: | |
type: Retrieval | |
dataset: | |
type: nq | |
name: MTEB NQ | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 14.999 | |
- type: map_at_10 | |
value: 26.229999999999997 | |
- type: map_at_100 | |
value: 27.77 | |
- type: map_at_1000 | |
value: 27.832 | |
- type: map_at_3 | |
value: 22.127 | |
- type: map_at_5 | |
value: 24.395 | |
- type: mrr_at_1 | |
value: 17.265 | |
- type: mrr_at_10 | |
value: 28.515 | |
- type: mrr_at_100 | |
value: 29.793999999999997 | |
- type: mrr_at_1000 | |
value: 29.837999999999997 | |
- type: mrr_at_3 | |
value: 24.609 | |
- type: mrr_at_5 | |
value: 26.790000000000003 | |
- type: ndcg_at_1 | |
value: 17.236 | |
- type: ndcg_at_10 | |
value: 33.207 | |
- type: ndcg_at_100 | |
value: 40.211000000000006 | |
- type: ndcg_at_1000 | |
value: 41.669 | |
- type: ndcg_at_3 | |
value: 25.013 | |
- type: ndcg_at_5 | |
value: 28.965999999999998 | |
- type: precision_at_1 | |
value: 17.236 | |
- type: precision_at_10 | |
value: 6.260000000000001 | |
- type: precision_at_100 | |
value: 1.015 | |
- type: precision_at_1000 | |
value: 0.11499999999999999 | |
- type: precision_at_3 | |
value: 12.032 | |
- type: precision_at_5 | |
value: 9.45 | |
- type: recall_at_1 | |
value: 14.999 | |
- type: recall_at_10 | |
value: 52.581 | |
- type: recall_at_100 | |
value: 83.918 | |
- type: recall_at_1000 | |
value: 94.735 | |
- type: recall_at_3 | |
value: 30.946 | |
- type: recall_at_5 | |
value: 40.136 | |
- task: | |
type: Retrieval | |
dataset: | |
type: quora | |
name: MTEB QuoraRetrieval | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 68.085 | |
- type: map_at_10 | |
value: 81.952 | |
- type: map_at_100 | |
value: 82.636 | |
- type: map_at_1000 | |
value: 82.65599999999999 | |
- type: map_at_3 | |
value: 78.83200000000001 | |
- type: map_at_5 | |
value: 80.793 | |
- type: mrr_at_1 | |
value: 78.45 | |
- type: mrr_at_10 | |
value: 85.35199999999999 | |
- type: mrr_at_100 | |
value: 85.483 | |
- type: mrr_at_1000 | |
value: 85.485 | |
- type: mrr_at_3 | |
value: 84.195 | |
- type: mrr_at_5 | |
value: 84.985 | |
- type: ndcg_at_1 | |
value: 78.46 | |
- type: ndcg_at_10 | |
value: 86.151 | |
- type: ndcg_at_100 | |
value: 87.589 | |
- type: ndcg_at_1000 | |
value: 87.737 | |
- type: ndcg_at_3 | |
value: 82.839 | |
- type: ndcg_at_5 | |
value: 84.67 | |
- type: precision_at_1 | |
value: 78.46 | |
- type: precision_at_10 | |
value: 13.114999999999998 | |
- type: precision_at_100 | |
value: 1.5190000000000001 | |
- type: precision_at_1000 | |
value: 0.156 | |
- type: precision_at_3 | |
value: 36.167 | |
- type: precision_at_5 | |
value: 23.921999999999997 | |
- type: recall_at_1 | |
value: 68.085 | |
- type: recall_at_10 | |
value: 94.28699999999999 | |
- type: recall_at_100 | |
value: 99.235 | |
- type: recall_at_1000 | |
value: 99.954 | |
- type: recall_at_3 | |
value: 84.941 | |
- type: recall_at_5 | |
value: 89.991 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering | |
name: MTEB RedditClustering | |
config: default | |
split: test | |
revision: 24640382cdbf8abc73003fb0fa6d111a705499eb | |
metrics: | |
- type: v_measure | |
value: 42.84102304870842 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/reddit-clustering-p2p | |
name: MTEB RedditClusteringP2P | |
config: default | |
split: test | |
revision: 282350215ef01743dc01b456c7f5241fa8937f16 | |
metrics: | |
- type: v_measure | |
value: 60.096590952185046 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scidocs | |
name: MTEB SCIDOCS | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 2.283 | |
- type: map_at_10 | |
value: 5.554 | |
- type: map_at_100 | |
value: 6.98 | |
- type: map_at_1000 | |
value: 7.324999999999999 | |
- type: map_at_3 | |
value: 3.9890000000000003 | |
- type: map_at_5 | |
value: 4.766 | |
- type: mrr_at_1 | |
value: 11.200000000000001 | |
- type: mrr_at_10 | |
value: 17.746000000000002 | |
- type: mrr_at_100 | |
value: 18.971 | |
- type: mrr_at_1000 | |
value: 19.1 | |
- type: mrr_at_3 | |
value: 15.15 | |
- type: mrr_at_5 | |
value: 16.619999999999997 | |
- type: ndcg_at_1 | |
value: 11.200000000000001 | |
- type: ndcg_at_10 | |
value: 10.001 | |
- type: ndcg_at_100 | |
value: 16.933 | |
- type: ndcg_at_1000 | |
value: 23.835 | |
- type: ndcg_at_3 | |
value: 9.005 | |
- type: ndcg_at_5 | |
value: 8.076 | |
- type: precision_at_1 | |
value: 11.200000000000001 | |
- type: precision_at_10 | |
value: 5.3 | |
- type: precision_at_100 | |
value: 1.5730000000000002 | |
- type: precision_at_1000 | |
value: 0.32299999999999995 | |
- type: precision_at_3 | |
value: 8.3 | |
- type: precision_at_5 | |
value: 7.12 | |
- type: recall_at_1 | |
value: 2.283 | |
- type: recall_at_10 | |
value: 10.775 | |
- type: recall_at_100 | |
value: 31.913000000000004 | |
- type: recall_at_1000 | |
value: 65.595 | |
- type: recall_at_3 | |
value: 5.0729999999999995 | |
- type: recall_at_5 | |
value: 7.228 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sickr-sts | |
name: MTEB SICK-R | |
config: default | |
split: test | |
revision: a6ea5a8cab320b040a23452cc28066d9beae2cee | |
metrics: | |
- type: cos_sim_spearman | |
value: 71.76588896280093 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts12-sts | |
name: MTEB STS12 | |
config: default | |
split: test | |
revision: a0d554a64d88156834ff5ae9920b964011b16384 | |
metrics: | |
- type: cos_sim_spearman | |
value: 65.3943089429597 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts13-sts | |
name: MTEB STS13 | |
config: default | |
split: test | |
revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca | |
metrics: | |
- type: cos_sim_spearman | |
value: 79.26435573752327 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts14-sts | |
name: MTEB STS14 | |
config: default | |
split: test | |
revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 | |
metrics: | |
- type: cos_sim_spearman | |
value: 72.98102120833857 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts15-sts | |
name: MTEB STS15 | |
config: default | |
split: test | |
revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 | |
metrics: | |
- type: cos_sim_spearman | |
value: 82.72040157931015 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts16-sts | |
name: MTEB STS16 | |
config: default | |
split: test | |
revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 | |
metrics: | |
- type: cos_sim_spearman | |
value: 81.020987615843 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts17-crosslingual-sts | |
name: MTEB STS17 (en-en) | |
config: en-en | |
split: test | |
revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d | |
metrics: | |
- type: cos_sim_spearman | |
value: 86.69902762920725 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/sts22-crosslingual-sts | |
name: MTEB STS22 (en) | |
config: en | |
split: test | |
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 | |
metrics: | |
- type: cos_sim_spearman | |
value: 63.474026946359615 | |
- task: | |
type: STS | |
dataset: | |
type: mteb/stsbenchmark-sts | |
name: MTEB STSBenchmark | |
config: default | |
split: test | |
revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 | |
metrics: | |
- type: cos_sim_spearman | |
value: 78.32422438643496 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/scidocs-reranking | |
name: MTEB SciDocsRR | |
config: default | |
split: test | |
revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab | |
metrics: | |
- type: map | |
value: 77.61818188370545 | |
- type: mrr | |
value: 93.57944887356652 | |
- task: | |
type: Retrieval | |
dataset: | |
type: scifact | |
name: MTEB SciFact | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 48.417 | |
- type: map_at_10 | |
value: 59.217 | |
- type: map_at_100 | |
value: 59.866 | |
- type: map_at_1000 | |
value: 59.91 | |
- type: map_at_3 | |
value: 56.302 | |
- type: map_at_5 | |
value: 58.252 | |
- type: mrr_at_1 | |
value: 51.0 | |
- type: mrr_at_10 | |
value: 60.368 | |
- type: mrr_at_100 | |
value: 60.901 | |
- type: mrr_at_1000 | |
value: 60.936 | |
- type: mrr_at_3 | |
value: 57.778 | |
- type: mrr_at_5 | |
value: 59.577999999999996 | |
- type: ndcg_at_1 | |
value: 51.0 | |
- type: ndcg_at_10 | |
value: 64.479 | |
- type: ndcg_at_100 | |
value: 67.37100000000001 | |
- type: ndcg_at_1000 | |
value: 68.367 | |
- type: ndcg_at_3 | |
value: 59.117 | |
- type: ndcg_at_5 | |
value: 62.283 | |
- type: precision_at_1 | |
value: 51.0 | |
- type: precision_at_10 | |
value: 8.833 | |
- type: precision_at_100 | |
value: 1.043 | |
- type: precision_at_1000 | |
value: 0.11299999999999999 | |
- type: precision_at_3 | |
value: 23.778 | |
- type: precision_at_5 | |
value: 16.067 | |
- type: recall_at_1 | |
value: 48.417 | |
- type: recall_at_10 | |
value: 79.567 | |
- type: recall_at_100 | |
value: 92.422 | |
- type: recall_at_1000 | |
value: 100.0 | |
- type: recall_at_3 | |
value: 65.011 | |
- type: recall_at_5 | |
value: 72.983 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/sprintduplicatequestions-pairclassification | |
name: MTEB SprintDuplicateQuestions | |
config: default | |
split: test | |
revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 99.63861386138613 | |
- type: cos_sim_ap | |
value: 87.57401607596607 | |
- type: cos_sim_f1 | |
value: 81.18006103763987 | |
- type: cos_sim_precision | |
value: 82.6086956521739 | |
- type: cos_sim_recall | |
value: 79.80000000000001 | |
- type: dot_accuracy | |
value: 99.36435643564356 | |
- type: dot_ap | |
value: 67.10054414762459 | |
- type: dot_f1 | |
value: 62.686567164179095 | |
- type: dot_precision | |
value: 70.08652657601978 | |
- type: dot_recall | |
value: 56.699999999999996 | |
- type: euclidean_accuracy | |
value: 99.6108910891089 | |
- type: euclidean_ap | |
value: 85.27455886915234 | |
- type: euclidean_f1 | |
value: 79.41330539549503 | |
- type: euclidean_precision | |
value: 83.3883388338834 | |
- type: euclidean_recall | |
value: 75.8 | |
- type: manhattan_accuracy | |
value: 99.62574257425743 | |
- type: manhattan_ap | |
value: 86.03781248244218 | |
- type: manhattan_f1 | |
value: 80.23012552301255 | |
- type: manhattan_precision | |
value: 84.10087719298247 | |
- type: manhattan_recall | |
value: 76.7 | |
- type: max_accuracy | |
value: 99.63861386138613 | |
- type: max_ap | |
value: 87.57401607596607 | |
- type: max_f1 | |
value: 81.18006103763987 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering | |
name: MTEB StackExchangeClustering | |
config: default | |
split: test | |
revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259 | |
metrics: | |
- type: v_measure | |
value: 65.11651958999349 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/stackexchange-clustering-p2p | |
name: MTEB StackExchangeClusteringP2P | |
config: default | |
split: test | |
revision: 815ca46b2622cec33ccafc3735d572c266efdb44 | |
metrics: | |
- type: v_measure | |
value: 33.60581294647579 | |
- task: | |
type: Reranking | |
dataset: | |
type: mteb/stackoverflowdupquestions-reranking | |
name: MTEB StackOverflowDupQuestions | |
config: default | |
split: test | |
revision: e185fbe320c72810689fc5848eb6114e1ef5ec69 | |
metrics: | |
- type: map | |
value: 47.773753263238696 | |
- type: mrr | |
value: 48.39623917748917 | |
- task: | |
type: Summarization | |
dataset: | |
type: mteb/summeval | |
name: MTEB SummEval | |
config: default | |
split: test | |
revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c | |
metrics: | |
- type: cos_sim_pearson | |
value: 31.564097570977395 | |
- type: cos_sim_spearman | |
value: 31.380186846178056 | |
- type: dot_pearson | |
value: 18.77679329172303 | |
- type: dot_spearman | |
value: 20.468892673671043 | |
- task: | |
type: Retrieval | |
dataset: | |
type: trec-covid | |
name: MTEB TRECCOVID | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 0.191 | |
- type: map_at_10 | |
value: 1.307 | |
- type: map_at_100 | |
value: 6.458 | |
- type: map_at_1000 | |
value: 16.785 | |
- type: map_at_3 | |
value: 0.47600000000000003 | |
- type: map_at_5 | |
value: 0.751 | |
- type: mrr_at_1 | |
value: 72.0 | |
- type: mrr_at_10 | |
value: 81.175 | |
- type: mrr_at_100 | |
value: 81.229 | |
- type: mrr_at_1000 | |
value: 81.229 | |
- type: mrr_at_3 | |
value: 79.667 | |
- type: mrr_at_5 | |
value: 80.667 | |
- type: ndcg_at_1 | |
value: 68.0 | |
- type: ndcg_at_10 | |
value: 60.672000000000004 | |
- type: ndcg_at_100 | |
value: 43.114000000000004 | |
- type: ndcg_at_1000 | |
value: 40.459 | |
- type: ndcg_at_3 | |
value: 65.642 | |
- type: ndcg_at_5 | |
value: 64.033 | |
- type: precision_at_1 | |
value: 72.0 | |
- type: precision_at_10 | |
value: 63.0 | |
- type: precision_at_100 | |
value: 43.82 | |
- type: precision_at_1000 | |
value: 18.758 | |
- type: precision_at_3 | |
value: 68.0 | |
- type: precision_at_5 | |
value: 67.60000000000001 | |
- type: recall_at_1 | |
value: 0.191 | |
- type: recall_at_10 | |
value: 1.585 | |
- type: recall_at_100 | |
value: 10.113999999999999 | |
- type: recall_at_1000 | |
value: 38.83 | |
- type: recall_at_3 | |
value: 0.514 | |
- type: recall_at_5 | |
value: 0.853 | |
- task: | |
type: Retrieval | |
dataset: | |
type: webis-touche2020 | |
name: MTEB Touche2020 | |
config: default | |
split: test | |
revision: None | |
metrics: | |
- type: map_at_1 | |
value: 0.857 | |
- type: map_at_10 | |
value: 4.154 | |
- type: map_at_100 | |
value: 7.1819999999999995 | |
- type: map_at_1000 | |
value: 8.501 | |
- type: map_at_3 | |
value: 2.3369999999999997 | |
- type: map_at_5 | |
value: 2.573 | |
- type: mrr_at_1 | |
value: 8.163 | |
- type: mrr_at_10 | |
value: 20.305 | |
- type: mrr_at_100 | |
value: 22.334 | |
- type: mrr_at_1000 | |
value: 22.397 | |
- type: mrr_at_3 | |
value: 17.347 | |
- type: mrr_at_5 | |
value: 18.673000000000002 | |
- type: ndcg_at_1 | |
value: 6.122 | |
- type: ndcg_at_10 | |
value: 10.18 | |
- type: ndcg_at_100 | |
value: 20.735999999999997 | |
- type: ndcg_at_1000 | |
value: 32.897999999999996 | |
- type: ndcg_at_3 | |
value: 10.299999999999999 | |
- type: ndcg_at_5 | |
value: 8.981 | |
- type: precision_at_1 | |
value: 8.163 | |
- type: precision_at_10 | |
value: 10.204 | |
- type: precision_at_100 | |
value: 5.061 | |
- type: precision_at_1000 | |
value: 1.276 | |
- type: precision_at_3 | |
value: 14.285999999999998 | |
- type: precision_at_5 | |
value: 10.612 | |
- type: recall_at_1 | |
value: 0.857 | |
- type: recall_at_10 | |
value: 8.57 | |
- type: recall_at_100 | |
value: 33.215 | |
- type: recall_at_1000 | |
value: 70.488 | |
- type: recall_at_3 | |
value: 3.527 | |
- type: recall_at_5 | |
value: 4.194 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/toxic_conversations_50k | |
name: MTEB ToxicConversationsClassification | |
config: default | |
split: test | |
revision: d7c0de2777da35d6aae2200a62c6e0e5af397c4c | |
metrics: | |
- type: accuracy | |
value: 71.8126 | |
- type: ap | |
value: 15.399874831474428 | |
- type: f1 | |
value: 55.733319106134225 | |
- task: | |
type: Classification | |
dataset: | |
type: mteb/tweet_sentiment_extraction | |
name: MTEB TweetSentimentExtractionClassification | |
config: default | |
split: test | |
revision: d604517c81ca91fe16a244d1248fc021f9ecee7a | |
metrics: | |
- type: accuracy | |
value: 57.167515563101304 | |
- type: f1 | |
value: 57.493718365420854 | |
- task: | |
type: Clustering | |
dataset: | |
type: mteb/twentynewsgroups-clustering | |
name: MTEB TwentyNewsgroupsClustering | |
config: default | |
split: test | |
revision: 6125ec4e24fa026cec8a478383ee943acfbd5449 | |
metrics: | |
- type: v_measure | |
value: 30.761111606661984 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twittersemeval2015-pairclassification | |
name: MTEB TwitterSemEval2015 | |
config: default | |
split: test | |
revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1 | |
metrics: | |
- type: cos_sim_accuracy | |
value: 83.90057817249806 | |
- type: cos_sim_ap | |
value: 65.13897428351787 | |
- type: cos_sim_f1 | |
value: 61.042677616025884 | |
- type: cos_sim_precision | |
value: 57.75841770661644 | |
- type: cos_sim_recall | |
value: 64.72295514511873 | |
- type: dot_accuracy | |
value: 80.60439887941826 | |
- type: dot_ap | |
value: 55.55250665214204 | |
- type: dot_f1 | |
value: 54.91251682368774 | |
- type: dot_precision | |
value: 47.75653531018338 | |
- type: dot_recall | |
value: 64.5910290237467 | |
- type: euclidean_accuracy | |
value: 83.30452405078381 | |
- type: euclidean_ap | |
value: 62.67995656680978 | |
- type: euclidean_f1 | |
value: 59.421025901472824 | |
- type: euclidean_precision | |
value: 57.268722466960355 | |
- type: euclidean_recall | |
value: 61.74142480211082 | |
- type: manhattan_accuracy | |
value: 83.39393216904095 | |
- type: manhattan_ap | |
value: 63.04154722022527 | |
- type: manhattan_f1 | |
value: 59.49575573292791 | |
- type: manhattan_precision | |
value: 57.226419692907626 | |
- type: manhattan_recall | |
value: 61.952506596306065 | |
- type: max_accuracy | |
value: 83.90057817249806 | |
- type: max_ap | |
value: 65.13897428351787 | |
- type: max_f1 | |
value: 61.042677616025884 | |
- task: | |
type: PairClassification | |
dataset: | |
type: mteb/twitterurlcorpus-pairclassification | |
name: MTEB TwitterURLCorpus | |
config: default | |
split: test | |
revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf | |
metrics: | |
- type: cos_sim_accuracy | |
value: 86.91349400395855 | |
- type: cos_sim_ap | |
value: 80.94267715916922 | |
- type: cos_sim_f1 | |
value: 73.80416854101064 | |
- type: cos_sim_precision | |
value: 71.91700759789596 | |
- type: cos_sim_recall | |
value: 75.79303972898059 | |
- type: dot_accuracy | |
value: 85.36694221290799 | |
- type: dot_ap | |
value: 76.58601958627575 | |
- type: dot_f1 | |
value: 71.08344449384913 | |
- type: dot_precision | |
value: 68.51428571428572 | |
- type: dot_recall | |
value: 73.85278718817369 | |
- type: euclidean_accuracy | |
value: 86.23627119959639 | |
- type: euclidean_ap | |
value: 79.39212423810176 | |
- type: euclidean_f1 | |
value: 72.54634884600833 | |
- type: euclidean_precision | |
value: 71.32123195952983 | |
- type: euclidean_recall | |
value: 73.81429011395134 | |
- type: manhattan_accuracy | |
value: 86.72720922109676 | |
- type: manhattan_ap | |
value: 80.52847011448226 | |
- type: manhattan_f1 | |
value: 73.27869471616877 | |
- type: manhattan_precision | |
value: 71.91785899621914 | |
- type: manhattan_recall | |
value: 74.69202340622113 | |
- type: max_accuracy | |
value: 86.91349400395855 | |
- type: max_ap | |
value: 80.94267715916922 | |
- type: max_f1 | |
value: 73.80416854101064 | |
# LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders | |
> LLM2Vec is a simple recipe to convert decoder-only LLMs into text encoders. It consists of 3 simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. The model can be further fine-tuned to achieve state-of-the-art performance. | |
- **Repository:** https://github.com/McGill-NLP/llm2vec | |
- **Paper:** https://arxiv.org/abs/2404.05961 | |
## Installation | |
```bash | |
pip install llm2vec | |
``` | |
## Usage | |
```python | |
from llm2vec import LLM2Vec | |
import torch | |
from transformers import AutoTokenizer, AutoModel, AutoConfig | |
from peft import PeftModel | |
# Loading base Mistral model, along with custom code that enables bidirectional connections in decoder-only LLMs. MNTP LoRA weights are merged into the base model. | |
tokenizer = AutoTokenizer.from_pretrained( | |
"McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp" | |
) | |
config = AutoConfig.from_pretrained( | |
"McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp", trust_remote_code=True | |
) | |
model = AutoModel.from_pretrained( | |
"McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp", | |
trust_remote_code=True, | |
config=config, | |
torch_dtype=torch.bfloat16, | |
device_map="cuda" if torch.cuda.is_available() else "cpu", | |
) | |
model = PeftModel.from_pretrained( | |
model, | |
"McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp", | |
) | |
model = model.merge_and_unload() # This can take several minutes on cpu | |
# Loading unsupervised SimCSE model. This loads the trained LoRA weights on top of MNTP model. Hence the final weights are -- Base model + MNTP (LoRA) + SimCSE (LoRA). | |
model = PeftModel.from_pretrained( | |
model, "McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp-unsup-simcse" | |
) | |
# Wrapper for encoding and pooling operations | |
l2v = LLM2Vec(model, tokenizer, pooling_mode="mean", max_length=512) | |
# Encoding queries using instructions | |
instruction = ( | |
"Given a web search query, retrieve relevant passages that answer the query:" | |
) | |
queries = [ | |
[instruction, "how much protein should a female eat"], | |
[instruction, "summit define"], | |
] | |
q_reps = l2v.encode(queries) | |
# Encoding documents. Instruction are not required for documents | |
documents = [ | |
"As a general guideline, the CDC's average requirement of protein for women ages 19 to 70 is 46 grams per day. But, as you can see from this chart, you'll need to increase that if you're expecting or training for a marathon. Check out the chart below to see how much protein you should be eating each day.", | |
"Definition of summit for English Language Learners. : 1 the highest point of a mountain : the top of a mountain. : 2 the highest level. : 3 a meeting or series of meetings between the leaders of two or more governments.", | |
] | |
d_reps = l2v.encode(documents) | |
# Compute cosine similarity | |
q_reps_norm = torch.nn.functional.normalize(q_reps, p=2, dim=1) | |
d_reps_norm = torch.nn.functional.normalize(d_reps, p=2, dim=1) | |
cos_sim = torch.mm(q_reps_norm, d_reps_norm.transpose(0, 1)) | |
print(cos_sim) | |
""" | |
tensor([[0.6231, 0.1744], | |
[0.1670, 0.4732]]) | |
""" | |
``` | |
## Questions | |
If you have any question about the code, feel free to email Parishad (`parishad.behnamghader@mila.quebec`) and Vaibhav (`vaibhav.adlakha@mila.quebec`). |