Conan-embedding-v1 / README.md
michaelfeil's picture
Update usage with infinity
29f408d verified
|
raw
history blame
26.7 kB
metadata
tags:
  - mteb
model-index:
  - name: conan-embedding
    results:
      - task:
          type: STS
        dataset:
          type: C-MTEB/AFQMC
          name: MTEB AFQMC
          config: default
          split: validation
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 56.613572467148856
          - type: cos_sim_spearman
            value: 60.66446211824284
          - type: euclidean_pearson
            value: 58.42080485872613
          - type: euclidean_spearman
            value: 59.82750030458164
          - type: manhattan_pearson
            value: 58.39885271199772
          - type: manhattan_spearman
            value: 59.817749720366734
      - task:
          type: STS
        dataset:
          type: C-MTEB/ATEC
          name: MTEB ATEC
          config: default
          split: test
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 56.60530380552331
          - type: cos_sim_spearman
            value: 58.63822441736707
          - type: euclidean_pearson
            value: 62.18551665180664
          - type: euclidean_spearman
            value: 58.23168804495912
          - type: manhattan_pearson
            value: 62.17191480770053
          - type: manhattan_spearman
            value: 58.22556219601401
      - task:
          type: Classification
        dataset:
          type: mteb/amazon_reviews_multi
          name: MTEB AmazonReviewsClassification (zh)
          config: zh
          split: test
          revision: 1399c76144fd37290681b995c656ef9b2e06e26d
        metrics:
          - type: accuracy
            value: 50.308
          - type: f1
            value: 46.927458607895126
      - task:
          type: STS
        dataset:
          type: C-MTEB/BQ
          name: MTEB BQ
          config: default
          split: test
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 72.6472074172711
          - type: cos_sim_spearman
            value: 74.50748447236577
          - type: euclidean_pearson
            value: 72.51833296451854
          - type: euclidean_spearman
            value: 73.9898922606105
          - type: manhattan_pearson
            value: 72.50184948939338
          - type: manhattan_spearman
            value: 73.97797921509638
      - task:
          type: Clustering
        dataset:
          type: C-MTEB/CLSClusteringP2P
          name: MTEB CLSClusteringP2P
          config: default
          split: test
          revision: None
        metrics:
          - type: v_measure
            value: 60.63545326048343
      - task:
          type: Clustering
        dataset:
          type: C-MTEB/CLSClusteringS2S
          name: MTEB CLSClusteringS2S
          config: default
          split: test
          revision: None
        metrics:
          - type: v_measure
            value: 52.64834762325994
      - task:
          type: Reranking
        dataset:
          type: C-MTEB/CMedQAv1-reranking
          name: MTEB CMedQAv1
          config: default
          split: test
          revision: None
        metrics:
          - type: map
            value: 91.38528814655234
          - type: mrr
            value: 93.35857142857144
      - task:
          type: Reranking
        dataset:
          type: C-MTEB/CMedQAv2-reranking
          name: MTEB CMedQAv2
          config: default
          split: test
          revision: None
        metrics:
          - type: map
            value: 89.72084678877096
          - type: mrr
            value: 91.74380952380953
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/CmedqaRetrieval
          name: MTEB CmedqaRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 26.987
          - type: map_at_10
            value: 40.675
          - type: map_at_100
            value: 42.495
          - type: map_at_1000
            value: 42.596000000000004
          - type: map_at_3
            value: 36.195
          - type: map_at_5
            value: 38.704
          - type: mrr_at_1
            value: 41.21
          - type: mrr_at_10
            value: 49.816
          - type: mrr_at_100
            value: 50.743
          - type: mrr_at_1000
            value: 50.77700000000001
          - type: mrr_at_3
            value: 47.312
          - type: mrr_at_5
            value: 48.699999999999996
          - type: ndcg_at_1
            value: 41.21
          - type: ndcg_at_10
            value: 47.606
          - type: ndcg_at_100
            value: 54.457
          - type: ndcg_at_1000
            value: 56.16100000000001
          - type: ndcg_at_3
            value: 42.108000000000004
          - type: ndcg_at_5
            value: 44.393
          - type: precision_at_1
            value: 41.21
          - type: precision_at_10
            value: 10.593
          - type: precision_at_100
            value: 1.609
          - type: precision_at_1000
            value: 0.183
          - type: precision_at_3
            value: 23.881
          - type: precision_at_5
            value: 17.339
          - type: recall_at_1
            value: 26.987
          - type: recall_at_10
            value: 58.875
          - type: recall_at_100
            value: 87.023
          - type: recall_at_1000
            value: 98.328
          - type: recall_at_3
            value: 42.265
          - type: recall_at_5
            value: 49.334
      - task:
          type: PairClassification
        dataset:
          type: C-MTEB/CMNLI
          name: MTEB Cmnli
          config: default
          split: validation
          revision: None
        metrics:
          - type: cos_sim_accuracy
            value: 85.91701743836441
          - type: cos_sim_ap
            value: 92.53650618807644
          - type: cos_sim_f1
            value: 86.80265975431082
          - type: cos_sim_precision
            value: 83.79025239338556
          - type: cos_sim_recall
            value: 90.039747486556
          - type: dot_accuracy
            value: 77.17378232110643
          - type: dot_ap
            value: 85.40244368166546
          - type: dot_f1
            value: 79.03038001481951
          - type: dot_precision
            value: 72.20502901353966
          - type: dot_recall
            value: 87.2808043020809
          - type: euclidean_accuracy
            value: 84.65423932651834
          - type: euclidean_ap
            value: 91.47775530034588
          - type: euclidean_f1
            value: 85.64471499723298
          - type: euclidean_precision
            value: 81.31567885666246
          - type: euclidean_recall
            value: 90.46060322656068
          - type: manhattan_accuracy
            value: 84.58208057726999
          - type: manhattan_ap
            value: 91.46228709402014
          - type: manhattan_f1
            value: 85.6631626034444
          - type: manhattan_precision
            value: 82.10075026795283
          - type: manhattan_recall
            value: 89.5487491232172
          - type: max_accuracy
            value: 85.91701743836441
          - type: max_ap
            value: 92.53650618807644
          - type: max_f1
            value: 86.80265975431082
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/CovidRetrieval
          name: MTEB CovidRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 83.693
          - type: map_at_10
            value: 90.098
          - type: map_at_100
            value: 90.145
          - type: map_at_1000
            value: 90.146
          - type: map_at_3
            value: 89.445
          - type: map_at_5
            value: 89.935
          - type: mrr_at_1
            value: 83.878
          - type: mrr_at_10
            value: 90.007
          - type: mrr_at_100
            value: 90.045
          - type: mrr_at_1000
            value: 90.046
          - type: mrr_at_3
            value: 89.34
          - type: mrr_at_5
            value: 89.835
          - type: ndcg_at_1
            value: 84.089
          - type: ndcg_at_10
            value: 92.351
          - type: ndcg_at_100
            value: 92.54599999999999
          - type: ndcg_at_1000
            value: 92.561
          - type: ndcg_at_3
            value: 91.15299999999999
          - type: ndcg_at_5
            value: 91.968
          - type: precision_at_1
            value: 84.089
          - type: precision_at_10
            value: 10.011000000000001
          - type: precision_at_100
            value: 1.009
          - type: precision_at_1000
            value: 0.101
          - type: precision_at_3
            value: 32.28
          - type: precision_at_5
            value: 19.789
          - type: recall_at_1
            value: 83.693
          - type: recall_at_10
            value: 99.05199999999999
          - type: recall_at_100
            value: 99.895
          - type: recall_at_1000
            value: 100
          - type: recall_at_3
            value: 95.917
          - type: recall_at_5
            value: 97.893
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/DuRetrieval
          name: MTEB DuRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 26.924
          - type: map_at_10
            value: 81.392
          - type: map_at_100
            value: 84.209
          - type: map_at_1000
            value: 84.237
          - type: map_at_3
            value: 56.998000000000005
          - type: map_at_5
            value: 71.40100000000001
          - type: mrr_at_1
            value: 91.75
          - type: mrr_at_10
            value: 94.45
          - type: mrr_at_100
            value: 94.503
          - type: mrr_at_1000
            value: 94.505
          - type: mrr_at_3
            value: 94.258
          - type: mrr_at_5
            value: 94.381
          - type: ndcg_at_1
            value: 91.75
          - type: ndcg_at_10
            value: 88.53
          - type: ndcg_at_100
            value: 91.13900000000001
          - type: ndcg_at_1000
            value: 91.387
          - type: ndcg_at_3
            value: 87.925
          - type: ndcg_at_5
            value: 86.461
          - type: precision_at_1
            value: 91.75
          - type: precision_at_10
            value: 42.05
          - type: precision_at_100
            value: 4.827
          - type: precision_at_1000
            value: 0.48900000000000005
          - type: precision_at_3
            value: 78.55
          - type: precision_at_5
            value: 65.82000000000001
          - type: recall_at_1
            value: 26.924
          - type: recall_at_10
            value: 89.338
          - type: recall_at_100
            value: 97.856
          - type: recall_at_1000
            value: 99.11
          - type: recall_at_3
            value: 59.202999999999996
          - type: recall_at_5
            value: 75.642
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/EcomRetrieval
          name: MTEB EcomRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 54.800000000000004
          - type: map_at_10
            value: 65.613
          - type: map_at_100
            value: 66.185
          - type: map_at_1000
            value: 66.191
          - type: map_at_3
            value: 62.8
          - type: map_at_5
            value: 64.535
          - type: mrr_at_1
            value: 54.800000000000004
          - type: mrr_at_10
            value: 65.613
          - type: mrr_at_100
            value: 66.185
          - type: mrr_at_1000
            value: 66.191
          - type: mrr_at_3
            value: 62.8
          - type: mrr_at_5
            value: 64.535
          - type: ndcg_at_1
            value: 54.800000000000004
          - type: ndcg_at_10
            value: 70.991
          - type: ndcg_at_100
            value: 73.434
          - type: ndcg_at_1000
            value: 73.587
          - type: ndcg_at_3
            value: 65.324
          - type: ndcg_at_5
            value: 68.431
          - type: precision_at_1
            value: 54.800000000000004
          - type: precision_at_10
            value: 8.790000000000001
          - type: precision_at_100
            value: 0.9860000000000001
          - type: precision_at_1000
            value: 0.1
          - type: precision_at_3
            value: 24.2
          - type: precision_at_5
            value: 16.02
          - type: recall_at_1
            value: 54.800000000000004
          - type: recall_at_10
            value: 87.9
          - type: recall_at_100
            value: 98.6
          - type: recall_at_1000
            value: 99.8
          - type: recall_at_3
            value: 72.6
          - type: recall_at_5
            value: 80.10000000000001
      - task:
          type: Classification
        dataset:
          type: C-MTEB/IFlyTek-classification
          name: MTEB IFlyTek
          config: default
          split: validation
          revision: None
        metrics:
          - type: accuracy
            value: 51.94305502116199
          - type: f1
            value: 39.82197338426721
      - task:
          type: Classification
        dataset:
          type: C-MTEB/JDReview-classification
          name: MTEB JDReview
          config: default
          split: test
          revision: None
        metrics:
          - type: accuracy
            value: 90.31894934333957
          - type: ap
            value: 63.89821836499594
          - type: f1
            value: 85.93687177603624
      - task:
          type: STS
        dataset:
          type: C-MTEB/LCQMC
          name: MTEB LCQMC
          config: default
          split: test
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 73.18906216730208
          - type: cos_sim_spearman
            value: 79.44570226735877
          - type: euclidean_pearson
            value: 78.8105072242798
          - type: euclidean_spearman
            value: 79.15605680863212
          - type: manhattan_pearson
            value: 78.80576507484064
          - type: manhattan_spearman
            value: 79.14625534068364
      - task:
          type: Reranking
        dataset:
          type: C-MTEB/Mmarco-reranking
          name: MTEB MMarcoReranking
          config: default
          split: dev
          revision: None
        metrics:
          - type: map
            value: 41.58107192600853
          - type: mrr
            value: 41.37063492063492
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/MMarcoRetrieval
          name: MTEB MMarcoRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 68.33
          - type: map_at_10
            value: 78.261
          - type: map_at_100
            value: 78.522
          - type: map_at_1000
            value: 78.527
          - type: map_at_3
            value: 76.236
          - type: map_at_5
            value: 77.557
          - type: mrr_at_1
            value: 70.602
          - type: mrr_at_10
            value: 78.779
          - type: mrr_at_100
            value: 79.00500000000001
          - type: mrr_at_1000
            value: 79.01
          - type: mrr_at_3
            value: 77.037
          - type: mrr_at_5
            value: 78.157
          - type: ndcg_at_1
            value: 70.602
          - type: ndcg_at_10
            value: 82.254
          - type: ndcg_at_100
            value: 83.319
          - type: ndcg_at_1000
            value: 83.449
          - type: ndcg_at_3
            value: 78.46
          - type: ndcg_at_5
            value: 80.679
          - type: precision_at_1
            value: 70.602
          - type: precision_at_10
            value: 9.989
          - type: precision_at_100
            value: 1.05
          - type: precision_at_1000
            value: 0.106
          - type: precision_at_3
            value: 29.598999999999997
          - type: precision_at_5
            value: 18.948
          - type: recall_at_1
            value: 68.33
          - type: recall_at_10
            value: 94.00800000000001
          - type: recall_at_100
            value: 98.589
          - type: recall_at_1000
            value: 99.60799999999999
          - type: recall_at_3
            value: 84.057
          - type: recall_at_5
            value: 89.32900000000001
      - task:
          type: Classification
        dataset:
          type: mteb/amazon_massive_intent
          name: MTEB MassiveIntentClassification (zh-CN)
          config: zh-CN
          split: test
          revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
        metrics:
          - type: accuracy
            value: 78.13718897108272
          - type: f1
            value: 74.07613180855328
      - task:
          type: Classification
        dataset:
          type: mteb/amazon_massive_scenario
          name: MTEB MassiveScenarioClassification (zh-CN)
          config: zh-CN
          split: test
          revision: 7d571f92784cd94a019292a1f45445077d0ef634
        metrics:
          - type: accuracy
            value: 86.20040349697376
          - type: f1
            value: 85.05282136519973
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/MedicalRetrieval
          name: MTEB MedicalRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 56.8
          - type: map_at_10
            value: 64.199
          - type: map_at_100
            value: 64.89
          - type: map_at_1000
            value: 64.917
          - type: map_at_3
            value: 62.383
          - type: map_at_5
            value: 63.378
          - type: mrr_at_1
            value: 56.8
          - type: mrr_at_10
            value: 64.199
          - type: mrr_at_100
            value: 64.89
          - type: mrr_at_1000
            value: 64.917
          - type: mrr_at_3
            value: 62.383
          - type: mrr_at_5
            value: 63.378
          - type: ndcg_at_1
            value: 56.8
          - type: ndcg_at_10
            value: 67.944
          - type: ndcg_at_100
            value: 71.286
          - type: ndcg_at_1000
            value: 71.879
          - type: ndcg_at_3
            value: 64.163
          - type: ndcg_at_5
            value: 65.96600000000001
          - type: precision_at_1
            value: 56.8
          - type: precision_at_10
            value: 7.9799999999999995
          - type: precision_at_100
            value: 0.954
          - type: precision_at_1000
            value: 0.1
          - type: precision_at_3
            value: 23.1
          - type: precision_at_5
            value: 14.74
          - type: recall_at_1
            value: 56.8
          - type: recall_at_10
            value: 79.80000000000001
          - type: recall_at_100
            value: 95.39999999999999
          - type: recall_at_1000
            value: 99.8
          - type: recall_at_3
            value: 69.3
          - type: recall_at_5
            value: 73.7
      - task:
          type: Classification
        dataset:
          type: C-MTEB/MultilingualSentiment-classification
          name: MTEB MultilingualSentiment
          config: default
          split: validation
          revision: None
        metrics:
          - type: accuracy
            value: 78.57666666666667
          - type: f1
            value: 78.23373528202681
      - task:
          type: PairClassification
        dataset:
          type: C-MTEB/OCNLI
          name: MTEB Ocnli
          config: default
          split: validation
          revision: None
        metrics:
          - type: cos_sim_accuracy
            value: 85.43584190579317
          - type: cos_sim_ap
            value: 90.76665640338129
          - type: cos_sim_f1
            value: 86.5021770682148
          - type: cos_sim_precision
            value: 79.82142857142858
          - type: cos_sim_recall
            value: 94.40337909186906
          - type: dot_accuracy
            value: 78.66811044937737
          - type: dot_ap
            value: 85.84084363880804
          - type: dot_f1
            value: 80.10075566750629
          - type: dot_precision
            value: 76.58959537572254
          - type: dot_recall
            value: 83.9493136219641
          - type: euclidean_accuracy
            value: 84.46128857606931
          - type: euclidean_ap
            value: 88.62351100230491
          - type: euclidean_f1
            value: 85.7709469509172
          - type: euclidean_precision
            value: 80.8411214953271
          - type: euclidean_recall
            value: 91.34107708553326
          - type: manhattan_accuracy
            value: 84.51543042772063
          - type: manhattan_ap
            value: 88.53975607870393
          - type: manhattan_f1
            value: 85.75697211155378
          - type: manhattan_precision
            value: 81.14985862393968
          - type: manhattan_recall
            value: 90.91869060190075
          - type: max_accuracy
            value: 85.43584190579317
          - type: max_ap
            value: 90.76665640338129
          - type: max_f1
            value: 86.5021770682148
      - task:
          type: Classification
        dataset:
          type: C-MTEB/OnlineShopping-classification
          name: MTEB OnlineShopping
          config: default
          split: test
          revision: None
        metrics:
          - type: accuracy
            value: 95.06999999999998
          - type: ap
            value: 93.45104559324996
          - type: f1
            value: 95.06036329426092
      - task:
          type: STS
        dataset:
          type: C-MTEB/PAWSX
          name: MTEB PAWSX
          config: default
          split: test
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 40.01998290519605
          - type: cos_sim_spearman
            value: 46.5989769986853
          - type: euclidean_pearson
            value: 45.37905883182924
          - type: euclidean_spearman
            value: 46.22213849806378
          - type: manhattan_pearson
            value: 45.40925124776211
          - type: manhattan_spearman
            value: 46.250705124226386
      - task:
          type: STS
        dataset:
          type: C-MTEB/QBQTC
          name: MTEB QBQTC
          config: default
          split: test
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 42.719516197112526
          - type: cos_sim_spearman
            value: 44.57507789581106
          - type: euclidean_pearson
            value: 35.73062264160721
          - type: euclidean_spearman
            value: 40.473523909913695
          - type: manhattan_pearson
            value: 35.69868964086357
          - type: manhattan_spearman
            value: 40.46349925372903
      - task:
          type: STS
        dataset:
          type: mteb/sts22-crosslingual-sts
          name: MTEB STS22 (zh)
          config: zh
          split: test
          revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
        metrics:
          - type: cos_sim_pearson
            value: 62.340118285801104
          - type: cos_sim_spearman
            value: 67.72781908620632
          - type: euclidean_pearson
            value: 63.161965746091596
          - type: euclidean_spearman
            value: 67.36825684340769
          - type: manhattan_pearson
            value: 63.089863788261425
          - type: manhattan_spearman
            value: 67.40868898995384
      - task:
          type: STS
        dataset:
          type: C-MTEB/STSB
          name: MTEB STSB
          config: default
          split: test
          revision: None
        metrics:
          - type: cos_sim_pearson
            value: 79.1646360962365
          - type: cos_sim_spearman
            value: 81.24426700767087
          - type: euclidean_pearson
            value: 79.43826409936123
          - type: euclidean_spearman
            value: 79.71787965300125
          - type: manhattan_pearson
            value: 79.43377784961737
          - type: manhattan_spearman
            value: 79.69348376886967
      - task:
          type: Reranking
        dataset:
          type: C-MTEB/T2Reranking
          name: MTEB T2Reranking
          config: default
          split: dev
          revision: None
        metrics:
          - type: map
            value: 68.35595092507496
          - type: mrr
            value: 79.00244892585788
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/T2Retrieval
          name: MTEB T2Retrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 26.588
          - type: map_at_10
            value: 75.327
          - type: map_at_100
            value: 79.095
          - type: map_at_1000
            value: 79.163
          - type: map_at_3
            value: 52.637
          - type: map_at_5
            value: 64.802
          - type: mrr_at_1
            value: 88.103
          - type: mrr_at_10
            value: 91.29899999999999
          - type: mrr_at_100
            value: 91.408
          - type: mrr_at_1000
            value: 91.411
          - type: mrr_at_3
            value: 90.801
          - type: mrr_at_5
            value: 91.12700000000001
          - type: ndcg_at_1
            value: 88.103
          - type: ndcg_at_10
            value: 83.314
          - type: ndcg_at_100
            value: 87.201
          - type: ndcg_at_1000
            value: 87.83999999999999
          - type: ndcg_at_3
            value: 84.408
          - type: ndcg_at_5
            value: 83.078
          - type: precision_at_1
            value: 88.103
          - type: precision_at_10
            value: 41.638999999999996
          - type: precision_at_100
            value: 5.006
          - type: precision_at_1000
            value: 0.516
          - type: precision_at_3
            value: 73.942
          - type: precision_at_5
            value: 62.056
          - type: recall_at_1
            value: 26.588
          - type: recall_at_10
            value: 82.819
          - type: recall_at_100
            value: 95.334
          - type: recall_at_1000
            value: 98.51299999999999
          - type: recall_at_3
            value: 54.74
          - type: recall_at_5
            value: 68.864
      - task:
          type: Classification
        dataset:
          type: C-MTEB/TNews-classification
          name: MTEB TNews
          config: default
          split: validation
          revision: None
        metrics:
          - type: accuracy
            value: 55.029
          - type: f1
            value: 53.043617905026764
      - task:
          type: Clustering
        dataset:
          type: C-MTEB/ThuNewsClusteringP2P
          name: MTEB ThuNewsClusteringP2P
          config: default
          split: test
          revision: None
        metrics:
          - type: v_measure
            value: 77.83675116835911
      - task:
          type: Clustering
        dataset:
          type: C-MTEB/ThuNewsClusteringS2S
          name: MTEB ThuNewsClusteringS2S
          config: default
          split: test
          revision: None
        metrics:
          - type: v_measure
            value: 74.19701455865277
      - task:
          type: Retrieval
        dataset:
          type: C-MTEB/VideoRetrieval
          name: MTEB VideoRetrieval
          config: default
          split: dev
          revision: None
        metrics:
          - type: map_at_1
            value: 64.7
          - type: map_at_10
            value: 75.593
          - type: map_at_100
            value: 75.863
          - type: map_at_1000
            value: 75.863
          - type: map_at_3
            value: 73.63300000000001
          - type: map_at_5
            value: 74.923
          - type: mrr_at_1
            value: 64.7
          - type: mrr_at_10
            value: 75.593
          - type: mrr_at_100
            value: 75.863
          - type: mrr_at_1000
            value: 75.863
          - type: mrr_at_3
            value: 73.63300000000001
          - type: mrr_at_5
            value: 74.923
          - type: ndcg_at_1
            value: 64.7
          - type: ndcg_at_10
            value: 80.399
          - type: ndcg_at_100
            value: 81.517
          - type: ndcg_at_1000
            value: 81.517
          - type: ndcg_at_3
            value: 76.504
          - type: ndcg_at_5
            value: 78.79899999999999
          - type: precision_at_1
            value: 64.7
          - type: precision_at_10
            value: 9.520000000000001
          - type: precision_at_100
            value: 1
          - type: precision_at_1000
            value: 0.1
          - type: precision_at_3
            value: 28.266999999999996
          - type: precision_at_5
            value: 18.060000000000002
          - type: recall_at_1
            value: 64.7
          - type: recall_at_10
            value: 95.19999999999999
          - type: recall_at_100
            value: 100
          - type: recall_at_1000
            value: 100
          - type: recall_at_3
            value: 84.8
          - type: recall_at_5
            value: 90.3
      - task:
          type: Classification
        dataset:
          type: C-MTEB/waimai-classification
          name: MTEB Waimai
          config: default
          split: test
          revision: None
        metrics:
          - type: accuracy
            value: 89.69999999999999
          - type: ap
            value: 75.91371640164184
          - type: f1
            value: 88.34067777698694
license: cc-by-nc-4.0
library_name: sentence-transformers

Conan-embedding-v1

Performance

Model Average CLS Clustering Reranking Retrieval STS Pair_CLS
gte-Qwen2-7B-instruct 72.05 75.09 66.06 68.92 76.03 65.33 87.48
xiaobu-embedding-v2 72.43 74.67 65.17 72.58 76.5 64.53 91.87
Conan-embedding-v1 72.62 75.03 66.33 72.76 76.67 64.18 91.66

Methods and Training Detials

Please refer to our technical report.

Local usage with Infinity

OpenAI compatible API deployment with Infinity.

docker run --gpus all -v $PWD/data:/app/.cache -e HF_TOKEN=$HF_TOKEN -p "7997":"7997" \
michaelf34/infinity:0.0.70 \
v2 --model-id TencentBAC/Conan-embedding-v1 --dtype float16 --batch-size 32 --engine torch --port 7997

Citation

If you find our models / papers useful in your research, please consider giving ❤️ and citations. Thanks!

@misc{li2024conanembeddinggeneraltextembedding,
  title={Conan-embedding: General Text Embedding with More and Better Negative Samples}, 
  author={Shiyu Li and Yang Tang and Shizhe Chen and Xi Chen},
  year={2024},
  eprint={2408.15710},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2408.15710}, 
}

About

Created by the Tencent BAC Group. All rights reserved.

License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.