|
--- |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- sentence-transformers |
|
- feature-extraction |
|
- sentence-similarity |
|
- transformers |
|
- mteb |
|
model-index: |
|
- name: mmlw-roberta-base |
|
results: |
|
- task: |
|
type: Clustering |
|
dataset: |
|
type: PL-MTEB/8tags-clustering |
|
name: MTEB 8TagsClustering |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: v_measure |
|
value: 33.08463724780795 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/allegro-reviews |
|
name: MTEB AllegroReviews |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 40.25844930417495 |
|
- type: f1 |
|
value: 35.59685265418916 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: arguana-pl |
|
name: MTEB ArguAna-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 33.073 |
|
- type: map_at_10 |
|
value: 50.223 |
|
- type: map_at_100 |
|
value: 50.942 |
|
- type: map_at_1000 |
|
value: 50.94499999999999 |
|
- type: map_at_3 |
|
value: 45.721000000000004 |
|
- type: map_at_5 |
|
value: 48.413000000000004 |
|
- type: mrr_at_1 |
|
value: 34.424 |
|
- type: mrr_at_10 |
|
value: 50.68899999999999 |
|
- type: mrr_at_100 |
|
value: 51.437999999999995 |
|
- type: mrr_at_1000 |
|
value: 51.441 |
|
- type: mrr_at_3 |
|
value: 46.219 |
|
- type: mrr_at_5 |
|
value: 48.921 |
|
- type: ndcg_at_1 |
|
value: 33.073 |
|
- type: ndcg_at_10 |
|
value: 59.021 |
|
- type: ndcg_at_100 |
|
value: 61.902 |
|
- type: ndcg_at_1000 |
|
value: 61.983999999999995 |
|
- type: ndcg_at_3 |
|
value: 49.818 |
|
- type: ndcg_at_5 |
|
value: 54.644999999999996 |
|
- type: precision_at_1 |
|
value: 33.073 |
|
- type: precision_at_10 |
|
value: 8.684 |
|
- type: precision_at_100 |
|
value: 0.9900000000000001 |
|
- type: precision_at_1000 |
|
value: 0.1 |
|
- type: precision_at_3 |
|
value: 20.555 |
|
- type: precision_at_5 |
|
value: 14.666 |
|
- type: recall_at_1 |
|
value: 33.073 |
|
- type: recall_at_10 |
|
value: 86.842 |
|
- type: recall_at_100 |
|
value: 99.004 |
|
- type: recall_at_1000 |
|
value: 99.644 |
|
- type: recall_at_3 |
|
value: 61.663999999999994 |
|
- type: recall_at_5 |
|
value: 73.329 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/cbd |
|
name: MTEB CBD |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 68.11 |
|
- type: ap |
|
value: 20.916633959031266 |
|
- type: f1 |
|
value: 56.85804802205465 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/cdsce-pairclassification |
|
name: MTEB CDSC-E |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 89.2 |
|
- type: cos_sim_ap |
|
value: 79.1041156765933 |
|
- type: cos_sim_f1 |
|
value: 70.0 |
|
- type: cos_sim_precision |
|
value: 74.11764705882354 |
|
- type: cos_sim_recall |
|
value: 66.3157894736842 |
|
- type: dot_accuracy |
|
value: 88.2 |
|
- type: dot_ap |
|
value: 72.57183688228149 |
|
- type: dot_f1 |
|
value: 67.16417910447761 |
|
- type: dot_precision |
|
value: 63.67924528301887 |
|
- type: dot_recall |
|
value: 71.05263157894737 |
|
- type: euclidean_accuracy |
|
value: 89.3 |
|
- type: euclidean_ap |
|
value: 79.01345533432428 |
|
- type: euclidean_f1 |
|
value: 70.19498607242339 |
|
- type: euclidean_precision |
|
value: 74.55621301775149 |
|
- type: euclidean_recall |
|
value: 66.3157894736842 |
|
- type: manhattan_accuracy |
|
value: 89.3 |
|
- type: manhattan_ap |
|
value: 79.01671381791259 |
|
- type: manhattan_f1 |
|
value: 70.0280112044818 |
|
- type: manhattan_precision |
|
value: 74.8502994011976 |
|
- type: manhattan_recall |
|
value: 65.78947368421053 |
|
- type: max_accuracy |
|
value: 89.3 |
|
- type: max_ap |
|
value: 79.1041156765933 |
|
- type: max_f1 |
|
value: 70.19498607242339 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: PL-MTEB/cdscr-sts |
|
name: MTEB CDSC-R |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 91.79559442663039 |
|
- type: cos_sim_spearman |
|
value: 92.5438168962641 |
|
- type: euclidean_pearson |
|
value: 92.02981265332856 |
|
- type: euclidean_spearman |
|
value: 92.5548245733484 |
|
- type: manhattan_pearson |
|
value: 91.95296287979178 |
|
- type: manhattan_spearman |
|
value: 92.50279516120241 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: dbpedia-pl |
|
name: MTEB DBPedia-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 7.829999999999999 |
|
- type: map_at_10 |
|
value: 16.616 |
|
- type: map_at_100 |
|
value: 23.629 |
|
- type: map_at_1000 |
|
value: 25.235999999999997 |
|
- type: map_at_3 |
|
value: 12.485 |
|
- type: map_at_5 |
|
value: 14.077 |
|
- type: mrr_at_1 |
|
value: 61.75000000000001 |
|
- type: mrr_at_10 |
|
value: 69.852 |
|
- type: mrr_at_100 |
|
value: 70.279 |
|
- type: mrr_at_1000 |
|
value: 70.294 |
|
- type: mrr_at_3 |
|
value: 68.375 |
|
- type: mrr_at_5 |
|
value: 69.187 |
|
- type: ndcg_at_1 |
|
value: 49.75 |
|
- type: ndcg_at_10 |
|
value: 36.217 |
|
- type: ndcg_at_100 |
|
value: 41.235 |
|
- type: ndcg_at_1000 |
|
value: 48.952 |
|
- type: ndcg_at_3 |
|
value: 41.669 |
|
- type: ndcg_at_5 |
|
value: 38.285000000000004 |
|
- type: precision_at_1 |
|
value: 61.5 |
|
- type: precision_at_10 |
|
value: 28.499999999999996 |
|
- type: precision_at_100 |
|
value: 9.572 |
|
- type: precision_at_1000 |
|
value: 2.025 |
|
- type: precision_at_3 |
|
value: 44.083 |
|
- type: precision_at_5 |
|
value: 36.3 |
|
- type: recall_at_1 |
|
value: 7.829999999999999 |
|
- type: recall_at_10 |
|
value: 21.462999999999997 |
|
- type: recall_at_100 |
|
value: 47.095 |
|
- type: recall_at_1000 |
|
value: 71.883 |
|
- type: recall_at_3 |
|
value: 13.891 |
|
- type: recall_at_5 |
|
value: 16.326999999999998 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: fiqa-pl |
|
name: MTEB FiQA-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 16.950000000000003 |
|
- type: map_at_10 |
|
value: 27.422 |
|
- type: map_at_100 |
|
value: 29.146 |
|
- type: map_at_1000 |
|
value: 29.328 |
|
- type: map_at_3 |
|
value: 23.735999999999997 |
|
- type: map_at_5 |
|
value: 25.671 |
|
- type: mrr_at_1 |
|
value: 33.796 |
|
- type: mrr_at_10 |
|
value: 42.689 |
|
- type: mrr_at_100 |
|
value: 43.522 |
|
- type: mrr_at_1000 |
|
value: 43.563 |
|
- type: mrr_at_3 |
|
value: 40.226 |
|
- type: mrr_at_5 |
|
value: 41.685 |
|
- type: ndcg_at_1 |
|
value: 33.642 |
|
- type: ndcg_at_10 |
|
value: 35.008 |
|
- type: ndcg_at_100 |
|
value: 41.839 |
|
- type: ndcg_at_1000 |
|
value: 45.035 |
|
- type: ndcg_at_3 |
|
value: 31.358999999999998 |
|
- type: ndcg_at_5 |
|
value: 32.377 |
|
- type: precision_at_1 |
|
value: 33.642 |
|
- type: precision_at_10 |
|
value: 9.937999999999999 |
|
- type: precision_at_100 |
|
value: 1.685 |
|
- type: precision_at_1000 |
|
value: 0.22699999999999998 |
|
- type: precision_at_3 |
|
value: 21.142 |
|
- type: precision_at_5 |
|
value: 15.586 |
|
- type: recall_at_1 |
|
value: 16.950000000000003 |
|
- type: recall_at_10 |
|
value: 42.286 |
|
- type: recall_at_100 |
|
value: 68.51899999999999 |
|
- type: recall_at_1000 |
|
value: 87.471 |
|
- type: recall_at_3 |
|
value: 28.834 |
|
- type: recall_at_5 |
|
value: 34.274 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: hotpotqa-pl |
|
name: MTEB HotpotQA-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 37.711 |
|
- type: map_at_10 |
|
value: 57.867999999999995 |
|
- type: map_at_100 |
|
value: 58.77 |
|
- type: map_at_1000 |
|
value: 58.836999999999996 |
|
- type: map_at_3 |
|
value: 54.400999999999996 |
|
- type: map_at_5 |
|
value: 56.564 |
|
- type: mrr_at_1 |
|
value: 75.449 |
|
- type: mrr_at_10 |
|
value: 81.575 |
|
- type: mrr_at_100 |
|
value: 81.783 |
|
- type: mrr_at_1000 |
|
value: 81.792 |
|
- type: mrr_at_3 |
|
value: 80.50399999999999 |
|
- type: mrr_at_5 |
|
value: 81.172 |
|
- type: ndcg_at_1 |
|
value: 75.422 |
|
- type: ndcg_at_10 |
|
value: 66.635 |
|
- type: ndcg_at_100 |
|
value: 69.85 |
|
- type: ndcg_at_1000 |
|
value: 71.179 |
|
- type: ndcg_at_3 |
|
value: 61.648 |
|
- type: ndcg_at_5 |
|
value: 64.412 |
|
- type: precision_at_1 |
|
value: 75.422 |
|
- type: precision_at_10 |
|
value: 13.962 |
|
- type: precision_at_100 |
|
value: 1.649 |
|
- type: precision_at_1000 |
|
value: 0.183 |
|
- type: precision_at_3 |
|
value: 39.172000000000004 |
|
- type: precision_at_5 |
|
value: 25.691000000000003 |
|
- type: recall_at_1 |
|
value: 37.711 |
|
- type: recall_at_10 |
|
value: 69.811 |
|
- type: recall_at_100 |
|
value: 82.471 |
|
- type: recall_at_1000 |
|
value: 91.29 |
|
- type: recall_at_3 |
|
value: 58.757999999999996 |
|
- type: recall_at_5 |
|
value: 64.227 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: msmarco-pl |
|
name: MTEB MSMARCO-PL |
|
config: default |
|
split: validation |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 17.033 |
|
- type: map_at_10 |
|
value: 27.242 |
|
- type: map_at_100 |
|
value: 28.451999999999998 |
|
- type: map_at_1000 |
|
value: 28.515 |
|
- type: map_at_3 |
|
value: 24.046 |
|
- type: map_at_5 |
|
value: 25.840999999999998 |
|
- type: mrr_at_1 |
|
value: 17.493 |
|
- type: mrr_at_10 |
|
value: 27.67 |
|
- type: mrr_at_100 |
|
value: 28.823999999999998 |
|
- type: mrr_at_1000 |
|
value: 28.881 |
|
- type: mrr_at_3 |
|
value: 24.529999999999998 |
|
- type: mrr_at_5 |
|
value: 26.27 |
|
- type: ndcg_at_1 |
|
value: 17.479 |
|
- type: ndcg_at_10 |
|
value: 33.048 |
|
- type: ndcg_at_100 |
|
value: 39.071 |
|
- type: ndcg_at_1000 |
|
value: 40.739999999999995 |
|
- type: ndcg_at_3 |
|
value: 26.493 |
|
- type: ndcg_at_5 |
|
value: 29.701 |
|
- type: precision_at_1 |
|
value: 17.479 |
|
- type: precision_at_10 |
|
value: 5.324 |
|
- type: precision_at_100 |
|
value: 0.8380000000000001 |
|
- type: precision_at_1000 |
|
value: 0.098 |
|
- type: precision_at_3 |
|
value: 11.408999999999999 |
|
- type: precision_at_5 |
|
value: 8.469999999999999 |
|
- type: recall_at_1 |
|
value: 17.033 |
|
- type: recall_at_10 |
|
value: 50.929 |
|
- type: recall_at_100 |
|
value: 79.262 |
|
- type: recall_at_1000 |
|
value: 92.239 |
|
- type: recall_at_3 |
|
value: 33.06 |
|
- type: recall_at_5 |
|
value: 40.747 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_massive_intent |
|
name: MTEB MassiveIntentClassification (pl) |
|
config: pl |
|
split: test |
|
revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 |
|
metrics: |
|
- type: accuracy |
|
value: 72.31002017484867 |
|
- type: f1 |
|
value: 69.61603671063031 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: mteb/amazon_massive_scenario |
|
name: MTEB MassiveScenarioClassification (pl) |
|
config: pl |
|
split: test |
|
revision: 7d571f92784cd94a019292a1f45445077d0ef634 |
|
metrics: |
|
- type: accuracy |
|
value: 75.52790854068594 |
|
- type: f1 |
|
value: 75.4053872472259 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: nfcorpus-pl |
|
name: MTEB NFCorpus-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 5.877000000000001 |
|
- type: map_at_10 |
|
value: 12.817 |
|
- type: map_at_100 |
|
value: 16.247 |
|
- type: map_at_1000 |
|
value: 17.683 |
|
- type: map_at_3 |
|
value: 9.334000000000001 |
|
- type: map_at_5 |
|
value: 10.886999999999999 |
|
- type: mrr_at_1 |
|
value: 45.201 |
|
- type: mrr_at_10 |
|
value: 52.7 |
|
- type: mrr_at_100 |
|
value: 53.425999999999995 |
|
- type: mrr_at_1000 |
|
value: 53.461000000000006 |
|
- type: mrr_at_3 |
|
value: 50.464 |
|
- type: mrr_at_5 |
|
value: 51.827 |
|
- type: ndcg_at_1 |
|
value: 41.949999999999996 |
|
- type: ndcg_at_10 |
|
value: 34.144999999999996 |
|
- type: ndcg_at_100 |
|
value: 31.556 |
|
- type: ndcg_at_1000 |
|
value: 40.265 |
|
- type: ndcg_at_3 |
|
value: 38.07 |
|
- type: ndcg_at_5 |
|
value: 36.571 |
|
- type: precision_at_1 |
|
value: 44.272 |
|
- type: precision_at_10 |
|
value: 25.697 |
|
- type: precision_at_100 |
|
value: 8.077 |
|
- type: precision_at_1000 |
|
value: 2.084 |
|
- type: precision_at_3 |
|
value: 36.016999999999996 |
|
- type: precision_at_5 |
|
value: 31.703 |
|
- type: recall_at_1 |
|
value: 5.877000000000001 |
|
- type: recall_at_10 |
|
value: 16.986 |
|
- type: recall_at_100 |
|
value: 32.719 |
|
- type: recall_at_1000 |
|
value: 63.763000000000005 |
|
- type: recall_at_3 |
|
value: 10.292 |
|
- type: recall_at_5 |
|
value: 12.886000000000001 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: nq-pl |
|
name: MTEB NQ-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 25.476 |
|
- type: map_at_10 |
|
value: 38.67 |
|
- type: map_at_100 |
|
value: 39.784000000000006 |
|
- type: map_at_1000 |
|
value: 39.831 |
|
- type: map_at_3 |
|
value: 34.829 |
|
- type: map_at_5 |
|
value: 37.025000000000006 |
|
- type: mrr_at_1 |
|
value: 28.621000000000002 |
|
- type: mrr_at_10 |
|
value: 41.13 |
|
- type: mrr_at_100 |
|
value: 42.028 |
|
- type: mrr_at_1000 |
|
value: 42.059999999999995 |
|
- type: mrr_at_3 |
|
value: 37.877 |
|
- type: mrr_at_5 |
|
value: 39.763999999999996 |
|
- type: ndcg_at_1 |
|
value: 28.563 |
|
- type: ndcg_at_10 |
|
value: 45.654 |
|
- type: ndcg_at_100 |
|
value: 50.695 |
|
- type: ndcg_at_1000 |
|
value: 51.873999999999995 |
|
- type: ndcg_at_3 |
|
value: 38.359 |
|
- type: ndcg_at_5 |
|
value: 42.045 |
|
- type: precision_at_1 |
|
value: 28.563 |
|
- type: precision_at_10 |
|
value: 7.6450000000000005 |
|
- type: precision_at_100 |
|
value: 1.052 |
|
- type: precision_at_1000 |
|
value: 0.117 |
|
- type: precision_at_3 |
|
value: 17.458000000000002 |
|
- type: precision_at_5 |
|
value: 12.613 |
|
- type: recall_at_1 |
|
value: 25.476 |
|
- type: recall_at_10 |
|
value: 64.484 |
|
- type: recall_at_100 |
|
value: 86.96199999999999 |
|
- type: recall_at_1000 |
|
value: 95.872 |
|
- type: recall_at_3 |
|
value: 45.527 |
|
- type: recall_at_5 |
|
value: 54.029 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: laugustyniak/abusive-clauses-pl |
|
name: MTEB PAC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 65.87315377932232 |
|
- type: ap |
|
value: 76.41966964416534 |
|
- type: f1 |
|
value: 63.64417488639012 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/ppc-pairclassification |
|
name: MTEB PPC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 87.7 |
|
- type: cos_sim_ap |
|
value: 92.81319372631636 |
|
- type: cos_sim_f1 |
|
value: 90.04048582995952 |
|
- type: cos_sim_precision |
|
value: 88.11410459587957 |
|
- type: cos_sim_recall |
|
value: 92.05298013245033 |
|
- type: dot_accuracy |
|
value: 75.0 |
|
- type: dot_ap |
|
value: 83.63089957943261 |
|
- type: dot_f1 |
|
value: 80.76923076923077 |
|
- type: dot_precision |
|
value: 75.43103448275862 |
|
- type: dot_recall |
|
value: 86.9205298013245 |
|
- type: euclidean_accuracy |
|
value: 87.7 |
|
- type: euclidean_ap |
|
value: 92.94772245932825 |
|
- type: euclidean_f1 |
|
value: 90.10458567980692 |
|
- type: euclidean_precision |
|
value: 87.63693270735524 |
|
- type: euclidean_recall |
|
value: 92.71523178807946 |
|
- type: manhattan_accuracy |
|
value: 87.8 |
|
- type: manhattan_ap |
|
value: 92.95330512127123 |
|
- type: manhattan_f1 |
|
value: 90.08130081300813 |
|
- type: manhattan_precision |
|
value: 88.49840255591054 |
|
- type: manhattan_recall |
|
value: 91.72185430463577 |
|
- type: max_accuracy |
|
value: 87.8 |
|
- type: max_ap |
|
value: 92.95330512127123 |
|
- type: max_f1 |
|
value: 90.10458567980692 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/psc-pairclassification |
|
name: MTEB PSC |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 96.19666048237477 |
|
- type: cos_sim_ap |
|
value: 98.61237969571302 |
|
- type: cos_sim_f1 |
|
value: 93.77845220030349 |
|
- type: cos_sim_precision |
|
value: 93.35347432024169 |
|
- type: cos_sim_recall |
|
value: 94.20731707317073 |
|
- type: dot_accuracy |
|
value: 94.89795918367348 |
|
- type: dot_ap |
|
value: 97.02853491357943 |
|
- type: dot_f1 |
|
value: 91.85185185185186 |
|
- type: dot_precision |
|
value: 89.33717579250721 |
|
- type: dot_recall |
|
value: 94.51219512195121 |
|
- type: euclidean_accuracy |
|
value: 96.38218923933209 |
|
- type: euclidean_ap |
|
value: 98.58145584134218 |
|
- type: euclidean_f1 |
|
value: 94.04580152671755 |
|
- type: euclidean_precision |
|
value: 94.18960244648318 |
|
- type: euclidean_recall |
|
value: 93.90243902439023 |
|
- type: manhattan_accuracy |
|
value: 96.47495361781077 |
|
- type: manhattan_ap |
|
value: 98.6108221024781 |
|
- type: manhattan_f1 |
|
value: 94.18960244648318 |
|
- type: manhattan_precision |
|
value: 94.47852760736197 |
|
- type: manhattan_recall |
|
value: 93.90243902439023 |
|
- type: max_accuracy |
|
value: 96.47495361781077 |
|
- type: max_ap |
|
value: 98.61237969571302 |
|
- type: max_f1 |
|
value: 94.18960244648318 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/polemo2_in |
|
name: MTEB PolEmo2.0-IN |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 71.73130193905818 |
|
- type: f1 |
|
value: 71.17731918813324 |
|
- task: |
|
type: Classification |
|
dataset: |
|
type: PL-MTEB/polemo2_out |
|
name: MTEB PolEmo2.0-OUT |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: accuracy |
|
value: 46.59919028340081 |
|
- type: f1 |
|
value: 37.216392949948954 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: quora-pl |
|
name: MTEB Quora-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 66.134 |
|
- type: map_at_10 |
|
value: 80.19 |
|
- type: map_at_100 |
|
value: 80.937 |
|
- type: map_at_1000 |
|
value: 80.95599999999999 |
|
- type: map_at_3 |
|
value: 77.074 |
|
- type: map_at_5 |
|
value: 79.054 |
|
- type: mrr_at_1 |
|
value: 75.88000000000001 |
|
- type: mrr_at_10 |
|
value: 83.226 |
|
- type: mrr_at_100 |
|
value: 83.403 |
|
- type: mrr_at_1000 |
|
value: 83.406 |
|
- type: mrr_at_3 |
|
value: 82.03200000000001 |
|
- type: mrr_at_5 |
|
value: 82.843 |
|
- type: ndcg_at_1 |
|
value: 75.94 |
|
- type: ndcg_at_10 |
|
value: 84.437 |
|
- type: ndcg_at_100 |
|
value: 86.13 |
|
- type: ndcg_at_1000 |
|
value: 86.29299999999999 |
|
- type: ndcg_at_3 |
|
value: 81.07799999999999 |
|
- type: ndcg_at_5 |
|
value: 83.0 |
|
- type: precision_at_1 |
|
value: 75.94 |
|
- type: precision_at_10 |
|
value: 12.953999999999999 |
|
- type: precision_at_100 |
|
value: 1.514 |
|
- type: precision_at_1000 |
|
value: 0.156 |
|
- type: precision_at_3 |
|
value: 35.61 |
|
- type: precision_at_5 |
|
value: 23.652 |
|
- type: recall_at_1 |
|
value: 66.134 |
|
- type: recall_at_10 |
|
value: 92.991 |
|
- type: recall_at_100 |
|
value: 99.003 |
|
- type: recall_at_1000 |
|
value: 99.86 |
|
- type: recall_at_3 |
|
value: 83.643 |
|
- type: recall_at_5 |
|
value: 88.81099999999999 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: scidocs-pl |
|
name: MTEB SCIDOCS-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 4.183 |
|
- type: map_at_10 |
|
value: 10.626 |
|
- type: map_at_100 |
|
value: 12.485 |
|
- type: map_at_1000 |
|
value: 12.793 |
|
- type: map_at_3 |
|
value: 7.531000000000001 |
|
- type: map_at_5 |
|
value: 9.037 |
|
- type: mrr_at_1 |
|
value: 20.5 |
|
- type: mrr_at_10 |
|
value: 30.175 |
|
- type: mrr_at_100 |
|
value: 31.356 |
|
- type: mrr_at_1000 |
|
value: 31.421 |
|
- type: mrr_at_3 |
|
value: 26.900000000000002 |
|
- type: mrr_at_5 |
|
value: 28.689999999999998 |
|
- type: ndcg_at_1 |
|
value: 20.599999999999998 |
|
- type: ndcg_at_10 |
|
value: 17.84 |
|
- type: ndcg_at_100 |
|
value: 25.518 |
|
- type: ndcg_at_1000 |
|
value: 31.137999999999998 |
|
- type: ndcg_at_3 |
|
value: 16.677 |
|
- type: ndcg_at_5 |
|
value: 14.641000000000002 |
|
- type: precision_at_1 |
|
value: 20.599999999999998 |
|
- type: precision_at_10 |
|
value: 9.3 |
|
- type: precision_at_100 |
|
value: 2.048 |
|
- type: precision_at_1000 |
|
value: 0.33999999999999997 |
|
- type: precision_at_3 |
|
value: 15.533 |
|
- type: precision_at_5 |
|
value: 12.839999999999998 |
|
- type: recall_at_1 |
|
value: 4.183 |
|
- type: recall_at_10 |
|
value: 18.862000000000002 |
|
- type: recall_at_100 |
|
value: 41.592 |
|
- type: recall_at_1000 |
|
value: 69.037 |
|
- type: recall_at_3 |
|
value: 9.443 |
|
- type: recall_at_5 |
|
value: 13.028 |
|
- task: |
|
type: PairClassification |
|
dataset: |
|
type: PL-MTEB/sicke-pl-pairclassification |
|
name: MTEB SICK-E-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_accuracy |
|
value: 86.32286995515696 |
|
- type: cos_sim_ap |
|
value: 82.04302619416443 |
|
- type: cos_sim_f1 |
|
value: 74.95572086432874 |
|
- type: cos_sim_precision |
|
value: 74.55954897815363 |
|
- type: cos_sim_recall |
|
value: 75.35612535612536 |
|
- type: dot_accuracy |
|
value: 83.9176518548716 |
|
- type: dot_ap |
|
value: 76.8608733580272 |
|
- type: dot_f1 |
|
value: 72.31936654569449 |
|
- type: dot_precision |
|
value: 67.36324523663184 |
|
- type: dot_recall |
|
value: 78.06267806267806 |
|
- type: euclidean_accuracy |
|
value: 86.32286995515696 |
|
- type: euclidean_ap |
|
value: 81.9648986659308 |
|
- type: euclidean_f1 |
|
value: 74.93796526054591 |
|
- type: euclidean_precision |
|
value: 74.59421312632321 |
|
- type: euclidean_recall |
|
value: 75.28490028490027 |
|
- type: manhattan_accuracy |
|
value: 86.30248675091724 |
|
- type: manhattan_ap |
|
value: 81.92853980116878 |
|
- type: manhattan_f1 |
|
value: 74.80968858131489 |
|
- type: manhattan_precision |
|
value: 72.74562584118439 |
|
- type: manhattan_recall |
|
value: 76.99430199430199 |
|
- type: max_accuracy |
|
value: 86.32286995515696 |
|
- type: max_ap |
|
value: 82.04302619416443 |
|
- type: max_f1 |
|
value: 74.95572086432874 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: PL-MTEB/sickr-pl-sts |
|
name: MTEB SICK-R-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 83.07566183637853 |
|
- type: cos_sim_spearman |
|
value: 79.20198022242548 |
|
- type: euclidean_pearson |
|
value: 81.27875473517936 |
|
- type: euclidean_spearman |
|
value: 79.21560102311153 |
|
- type: manhattan_pearson |
|
value: 81.21559474880459 |
|
- type: manhattan_spearman |
|
value: 79.1537846814979 |
|
- task: |
|
type: STS |
|
dataset: |
|
type: mteb/sts22-crosslingual-sts |
|
name: MTEB STS22 (pl) |
|
config: pl |
|
split: test |
|
revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80 |
|
metrics: |
|
- type: cos_sim_pearson |
|
value: 36.39657573900194 |
|
- type: cos_sim_spearman |
|
value: 40.36403461037013 |
|
- type: euclidean_pearson |
|
value: 29.143416004776316 |
|
- type: euclidean_spearman |
|
value: 40.43197841306375 |
|
- type: manhattan_pearson |
|
value: 29.18632337290767 |
|
- type: manhattan_spearman |
|
value: 40.50563343395481 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: scifact-pl |
|
name: MTEB SciFact-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 49.428 |
|
- type: map_at_10 |
|
value: 60.423 |
|
- type: map_at_100 |
|
value: 61.037 |
|
- type: map_at_1000 |
|
value: 61.065999999999995 |
|
- type: map_at_3 |
|
value: 56.989000000000004 |
|
- type: map_at_5 |
|
value: 59.041999999999994 |
|
- type: mrr_at_1 |
|
value: 52.666999999999994 |
|
- type: mrr_at_10 |
|
value: 61.746 |
|
- type: mrr_at_100 |
|
value: 62.273 |
|
- type: mrr_at_1000 |
|
value: 62.300999999999995 |
|
- type: mrr_at_3 |
|
value: 59.278 |
|
- type: mrr_at_5 |
|
value: 60.611000000000004 |
|
- type: ndcg_at_1 |
|
value: 52.333 |
|
- type: ndcg_at_10 |
|
value: 65.75 |
|
- type: ndcg_at_100 |
|
value: 68.566 |
|
- type: ndcg_at_1000 |
|
value: 69.314 |
|
- type: ndcg_at_3 |
|
value: 59.768 |
|
- type: ndcg_at_5 |
|
value: 62.808 |
|
- type: precision_at_1 |
|
value: 52.333 |
|
- type: precision_at_10 |
|
value: 9.167 |
|
- type: precision_at_100 |
|
value: 1.0630000000000002 |
|
- type: precision_at_1000 |
|
value: 0.11299999999999999 |
|
- type: precision_at_3 |
|
value: 23.778 |
|
- type: precision_at_5 |
|
value: 16.2 |
|
- type: recall_at_1 |
|
value: 49.428 |
|
- type: recall_at_10 |
|
value: 81.07799999999999 |
|
- type: recall_at_100 |
|
value: 93.93299999999999 |
|
- type: recall_at_1000 |
|
value: 99.667 |
|
- type: recall_at_3 |
|
value: 65.061 |
|
- type: recall_at_5 |
|
value: 72.667 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
type: trec-covid-pl |
|
name: MTEB TRECCOVID-PL |
|
config: default |
|
split: test |
|
revision: None |
|
metrics: |
|
- type: map_at_1 |
|
value: 0.22100000000000003 |
|
- type: map_at_10 |
|
value: 1.788 |
|
- type: map_at_100 |
|
value: 9.937 |
|
- type: map_at_1000 |
|
value: 24.762999999999998 |
|
- type: map_at_3 |
|
value: 0.579 |
|
- type: map_at_5 |
|
value: 0.947 |
|
- type: mrr_at_1 |
|
value: 78.0 |
|
- type: mrr_at_10 |
|
value: 88.067 |
|
- type: mrr_at_100 |
|
value: 88.067 |
|
- type: mrr_at_1000 |
|
value: 88.067 |
|
- type: mrr_at_3 |
|
value: 87.667 |
|
- type: mrr_at_5 |
|
value: 88.067 |
|
- type: ndcg_at_1 |
|
value: 76.0 |
|
- type: ndcg_at_10 |
|
value: 71.332 |
|
- type: ndcg_at_100 |
|
value: 54.80500000000001 |
|
- type: ndcg_at_1000 |
|
value: 49.504999999999995 |
|
- type: ndcg_at_3 |
|
value: 73.693 |
|
- type: ndcg_at_5 |
|
value: 73.733 |
|
- type: precision_at_1 |
|
value: 82.0 |
|
- type: precision_at_10 |
|
value: 76.8 |
|
- type: precision_at_100 |
|
value: 56.68 |
|
- type: precision_at_1000 |
|
value: 22.236 |
|
- type: precision_at_3 |
|
value: 78.667 |
|
- type: precision_at_5 |
|
value: 79.2 |
|
- type: recall_at_1 |
|
value: 0.22100000000000003 |
|
- type: recall_at_10 |
|
value: 2.033 |
|
- type: recall_at_100 |
|
value: 13.431999999999999 |
|
- type: recall_at_1000 |
|
value: 46.913 |
|
- type: recall_at_3 |
|
value: 0.625 |
|
- type: recall_at_5 |
|
value: 1.052 |
|
language: pl |
|
license: apache-2.0 |
|
widget: |
|
- source_sentence: "zapytanie: Jak dożyć 100 lat?" |
|
sentences: |
|
- "Trzeba zdrowo się odżywiać i uprawiać sport." |
|
- "Trzeba pić alkohol, imprezować i jeździć szybkimi autami." |
|
- "Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu." |
|
|
|
--- |
|
|
|
<h1 align="center">MMLW-roberta-base</h1> |
|
|
|
MMLW (muszę mieć lepszą wiadomość) are neural text encoders for Polish. |
|
This is a distilled model that can be used to generate embeddings applicable to many tasks such as semantic similarity, clustering, information retrieval. The model can also serve as a base for further fine-tuning. |
|
It transforms texts to 768 dimensional vectors. |
|
The model was initialized with Polish RoBERTa checkpoint, and then trained with [multilingual knowledge distillation method](https://aclanthology.org/2020.emnlp-main.365/) on a diverse corpus of 60 million Polish-English text pairs. We utilised [English FlagEmbeddings (BGE)](https://huggingface.co/BAAI/bge-base-en) as teacher models for distillation. |
|
|
|
## Usage (Sentence-Transformers) |
|
|
|
⚠️ Our embedding models require the use of specific prefixes and suffixes when encoding texts. For this model, each query should be preceded by the prefix **"zapytanie: "** ⚠️ |
|
|
|
You can use the model like this with [sentence-transformers](https://www.SBERT.net): |
|
|
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
from sentence_transformers.util import cos_sim |
|
|
|
query_prefix = "zapytanie: " |
|
answer_prefix = "" |
|
queries = [query_prefix + "Jak dożyć 100 lat?"] |
|
answers = [ |
|
answer_prefix + "Trzeba zdrowo się odżywiać i uprawiać sport.", |
|
answer_prefix + "Trzeba pić alkohol, imprezować i jeździć szybkimi autami.", |
|
answer_prefix + "Gdy trwała kampania politycy zapewniali, że rozprawią się z zakazem niedzielnego handlu." |
|
] |
|
model = SentenceTransformer("sdadas/mmlw-roberta-base") |
|
queries_emb = model.encode(queries, convert_to_tensor=True, show_progress_bar=False) |
|
answers_emb = model.encode(answers, convert_to_tensor=True, show_progress_bar=False) |
|
|
|
best_answer = cos_sim(queries_emb, answers_emb).argmax().item() |
|
print(answers[best_answer]) |
|
# Trzeba zdrowo się odżywiać i uprawiać sport. |
|
``` |
|
|
|
## Evaluation Results |
|
|
|
- The model achieves an **Average Score** of **61.05** on the Polish Massive Text Embedding Benchmark (MTEB). See [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) for detailed results. |
|
- The model achieves **NDCG@10** of **53.60** on the Polish Information Retrieval Benchmark. See [PIRB Leaderboard](https://huggingface.co/spaces/sdadas/pirb) for detailed results. |
|
|
|
## Acknowledgements |
|
This model was trained with the A100 GPU cluster support delivered by the Gdansk University of Technology within the TASK center initiative. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@article{dadas2024pirb, |
|
title={{PIRB}: A Comprehensive Benchmark of Polish Dense and Hybrid Text Retrieval Methods}, |
|
author={Sławomir Dadas and Michał Perełkiewicz and Rafał Poświata}, |
|
year={2024}, |
|
eprint={2402.13350}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |