Fizzarolli's picture
Update README.md
d4b6633 verified
metadata
language:
  - en
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:557850
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: answerdotai/ModernBERT-base
widget:
  - source_sentence: >-
      A construction worker is standing on a crane placing a large arm on top of
      a stature in progress.
    sentences:
      - A man is playing with his camera.
      - A person standing
      - Nobody is standing
  - source_sentence: A boy in red slides down an inflatable ride.
    sentences:
      - a baby smiling
      - A boy is playing on an inflatable ride.
      - A boy pierces a knife through an inflatable ride.
  - source_sentence: A man in a black shirt is playing a guitar.
    sentences:
      - A group of women are selling their wares
      - The man is wearing black.
      - The man is wearing a blue shirt.
  - source_sentence: >-
      A man with a large power drill standing next to his daughter with a vacuum
      cleaner hose.
    sentences:
      - A man holding a drill stands next to a girl holding a vacuum hose.
      - Kids ride an amusement ride.
      - The man and girl are painting the walls.
  - source_sentence: A middle-aged man works under the engine of a train on rail tracks.
    sentences:
      - A guy is working on a train.
      - Two young asian men are squatting.
      - A guy is driving to work.
datasets:
  - sentence-transformers/all-nli
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - pearson_cosine
  - spearman_cosine
model-index:
  - name: SentenceTransformer based on estrogen/ModernBERT-base-sbert-initialized
    results:
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts dev
          type: sts-dev
        metrics:
          - type: pearson_cosine
            value: 0.8601586939371598
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8650559283517015
            name: Spearman Cosine
      - task:
          type: semantic-similarity
          name: Semantic Similarity
        dataset:
          name: sts test
          type: sts-test
        metrics:
          - type: pearson_cosine
            value: 0.8483904083763342
            name: Pearson Cosine
          - type: spearman_cosine
            value: 0.8504558364206114
            name: Spearman Cosine

SentenceTransformer based on estrogen/ModernBERT-base-sbert-initialized

This is a sentence-transformers model finetuned from estrogen/ModernBERT-base-sbert-initialized on the all-nli dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("estrogen/ModernBERT-base-nli-v3")
# Run inference
sentences = [
    'A middle-aged man works under the engine of a train on rail tracks.',
    'A guy is working on a train.',
    'A guy is driving to work.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric sts-dev sts-test
pearson_cosine 0.8602 0.8484
spearman_cosine 0.8651 0.8505

Training Details

Training Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 557,850 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 10.46 tokens
    • max: 46 tokens
    • min: 6 tokens
    • mean: 12.91 tokens
    • max: 40 tokens
    • min: 5 tokens
    • mean: 13.49 tokens
    • max: 51 tokens
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Children smiling and waving at camera There are children present The kids are frowning
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. The boy skates down the sidewalk.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

all-nli

  • Dataset: all-nli at d482672
  • Size: 6,584 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 18.25 tokens
    • max: 69 tokens
    • min: 5 tokens
    • mean: 9.88 tokens
    • max: 30 tokens
    • min: 5 tokens
    • mean: 10.48 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    Two women are embracing while holding to go packages. Two woman are holding packages. The men are fighting outside a deli.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands. Two kids in jackets walk to school.
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles A man selling donuts to a customer. A woman drinks her coffee in a small cafe.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 1024
  • per_device_eval_batch_size: 1024
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 1024
  • per_device_eval_batch_size: 1024
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss sts-dev_spearman_cosine sts-test_spearman_cosine
0 0 - - 0.5576 -
0.0018 1 36.2556 - - -
0.0037 2 36.6329 - - -
0.0055 3 36.9705 - - -
0.0073 4 36.9173 - - -
0.0092 5 36.8254 - - -
0.0110 6 36.7313 - - -
0.0128 7 36.5865 - - -
0.0147 8 36.1709 - - -
0.0165 9 36.0519 - - -
0.0183 10 35.712 - - -
0.0202 11 35.4072 - - -
0.0220 12 35.0623 - - -
0.0239 13 34.6996 - - -
0.0257 14 34.2426 - - -
0.0275 15 33.6913 - - -
0.0294 16 33.2808 - - -
0.0312 17 32.5487 - - -
0.0330 18 31.6451 - - -
0.0349 19 30.7017 - - -
0.0367 20 29.8238 - - -
0.0385 21 28.7414 - - -
0.0404 22 27.316 - - -
0.0422 23 26.1119 - - -
0.0440 24 24.7211 - - -
0.0459 25 24.0007 - - -
0.0477 26 22.706 - - -
0.0495 27 21.7943 - - -
0.0514 28 21.5753 - - -
0.0532 29 20.9671 - - -
0.0550 30 20.5548 - - -
0.0569 31 20.263 - - -
0.0587 32 19.8474 - - -
0.0606 33 18.846 - - -
0.0624 34 18.5923 - - -
0.0642 35 17.8432 - - -
0.0661 36 17.6267 - - -
0.0679 37 17.1291 - - -
0.0697 38 16.6147 - - -
0.0716 39 16.1403 - - -
0.0734 40 16.5382 - - -
0.0752 41 15.7209 - - -
0.0771 42 15.565 - - -
0.0789 43 15.2099 - - -
0.0807 44 15.2644 - - -
0.0826 45 14.8458 - - -
0.0844 46 15.2214 - - -
0.0862 47 15.194 - - -
0.0881 48 15.53 - - -
0.0899 49 14.893 - - -
0.0917 50 14.4146 - - -
0.0936 51 14.4308 - - -
0.0954 52 13.8239 - - -
0.0972 53 13.9299 - - -
0.0991 54 14.6545 - - -
0.1009 55 14.3374 - - -
0.1028 56 14.5065 - - -
0.1046 57 13.8447 - - -
0.1064 58 14.179 - - -
0.1083 59 13.8866 - - -
0.1101 60 13.4879 - - -
0.1119 61 13.6273 - - -
0.1138 62 13.891 - - -
0.1156 63 13.6066 - - -
0.1174 64 13.4999 - - -
0.1193 65 13.9862 - - -
0.1211 66 13.4257 - - -
0.1229 67 13.9192 - - -
0.1248 68 13.5504 - - -
0.1266 69 13.3689 - - -
0.1284 70 13.4802 - - -
0.1303 71 13.0249 - - -
0.1321 72 13.2021 - - -
0.1339 73 13.1101 - - -
0.1358 74 13.0868 - - -
0.1376 75 12.8536 - - -
0.1394 76 12.9317 - - -
0.1413 77 12.6403 - - -
0.1431 78 12.9776 - - -
0.1450 79 13.1359 - - -
0.1468 80 13.0558 - - -
0.1486 81 13.0849 - - -
0.1505 82 12.6719 - - -
0.1523 83 12.5796 - - -
0.1541 84 12.472 - - -
0.1560 85 12.4221 - - -
0.1578 86 12.0878 - - -
0.1596 87 12.6923 - - -
0.1615 88 12.4428 - - -
0.1633 89 12.2897 - - -
0.1651 90 12.4254 - - -
0.1670 91 12.3808 - - -
0.1688 92 12.5224 - - -
0.1706 93 12.48 - - -
0.1725 94 11.8793 - - -
0.1743 95 11.8582 - - -
0.1761 96 12.5362 - - -
0.1780 97 12.3912 - - -
0.1798 98 12.7162 - - -
0.1817 99 12.4455 - - -
0.1835 100 12.4815 8.5398 0.8199 -
0.1853 101 12.1586 - - -
0.1872 102 11.8041 - - -
0.1890 103 11.6278 - - -
0.1908 104 11.8511 - - -
0.1927 105 11.762 - - -
0.1945 106 11.568 - - -
0.1963 107 11.8152 - - -
0.1982 108 11.9005 - - -
0.2 109 11.9282 - - -
0.2018 110 11.8451 - - -
0.2037 111 12.1208 - - -
0.2055 112 11.6718 - - -
0.2073 113 11.0296 - - -
0.2092 114 11.4185 - - -
0.2110 115 11.337 - - -
0.2128 116 10.9242 - - -
0.2147 117 11.0482 - - -
0.2165 118 11.3196 - - -
0.2183 119 11.1849 - - -
0.2202 120 10.9769 - - -
0.2220 121 10.5047 - - -
0.2239 122 11.1094 - - -
0.2257 123 11.2565 - - -
0.2275 124 11.1569 - - -
0.2294 125 11.5391 - - -
0.2312 126 10.8941 - - -
0.2330 127 10.8196 - - -
0.2349 128 11.0836 - - -
0.2367 129 11.4241 - - -
0.2385 130 11.4976 - - -
0.2404 131 10.938 - - -
0.2422 132 11.5283 - - -
0.2440 133 11.4238 - - -
0.2459 134 11.3364 - - -
0.2477 135 11.225 - - -
0.2495 136 11.0415 - - -
0.2514 137 10.8503 - - -
0.2532 138 10.9302 - - -
0.2550 139 10.5476 - - -
0.2569 140 10.8422 - - -
0.2587 141 10.4239 - - -
0.2606 142 10.5155 - - -
0.2624 143 10.589 - - -
0.2642 144 10.6116 - - -
0.2661 145 10.7158 - - -
0.2679 146 10.6952 - - -
0.2697 147 10.3678 - - -
0.2716 148 11.159 - - -
0.2734 149 11.3336 - - -
0.2752 150 10.7669 - - -
0.2771 151 10.5946 - - -
0.2789 152 10.9448 - - -
0.2807 153 10.7132 - - -
0.2826 154 10.5812 - - -
0.2844 155 10.7827 - - -
0.2862 156 10.7807 - - -
0.2881 157 10.7351 - - -
0.2899 158 10.7904 - - -
0.2917 159 10.5921 - - -
0.2936 160 10.2996 - - -
0.2954 161 10.2353 - - -
0.2972 162 10.2108 - - -
0.2991 163 10.089 - - -
0.3009 164 10.1736 - - -
0.3028 165 10.2599 - - -
0.3046 166 10.4347 - - -
0.3064 167 10.9999 - - -
0.3083 168 11.1655 - - -
0.3101 169 10.8125 - - -
0.3119 170 10.5497 - - -
0.3138 171 10.6918 - - -
0.3156 172 10.4792 - - -
0.3174 173 10.6018 - - -
0.3193 174 10.2092 - - -
0.3211 175 10.5625 - - -
0.3229 176 10.3539 - - -
0.3248 177 9.5403 - - -
0.3266 178 10.2351 - - -
0.3284 179 10.1557 - - -
0.3303 180 10.0721 - - -
0.3321 181 9.721 - - -
0.3339 182 9.7519 - - -
0.3358 183 9.7737 - - -
0.3376 184 9.5207 - - -
0.3394 185 9.6557 - - -
0.3413 186 9.7205 - - -
0.3431 187 9.9902 - - -
0.3450 188 10.1699 - - -
0.3468 189 10.5102 - - -
0.3486 190 10.2026 - - -
0.3505 191 10.1148 - - -
0.3523 192 9.5341 - - -
0.3541 193 9.5213 - - -
0.3560 194 9.7469 - - -
0.3578 195 10.1795 - - -
0.3596 196 10.3835 - - -
0.3615 197 10.7346 - - -
0.3633 198 9.9378 - - -
0.3651 199 9.7758 - - -
0.3670 200 10.3206 7.0991 0.8294 -
0.3688 201 9.7032 - - -
0.3706 202 9.8851 - - -
0.3725 203 9.9285 - - -
0.3743 204 10.0227 - - -
0.3761 205 9.8062 - - -
0.3780 206 9.9988 - - -
0.3798 207 10.0256 - - -
0.3817 208 9.8837 - - -
0.3835 209 10.0787 - - -
0.3853 210 9.5776 - - -
0.3872 211 9.6239 - - -
0.3890 212 9.717 - - -
0.3908 213 10.1639 - - -
0.3927 214 9.4994 - - -
0.3945 215 9.6895 - - -
0.3963 216 9.4938 - - -
0.3982 217 9.3008 - - -
0.4 218 9.6183 - - -
0.4018 219 9.3632 - - -
0.4037 220 9.3575 - - -
0.4055 221 9.4888 - - -
0.4073 222 9.337 - - -
0.4092 223 9.9598 - - -
0.4110 224 9.345 - - -
0.4128 225 9.2595 - - -
0.4147 226 9.3508 - - -
0.4165 227 9.8293 - - -
0.4183 228 9.8365 - - -
0.4202 229 9.6528 - - -
0.4220 230 9.9696 - - -
0.4239 231 10.113 - - -
0.4257 232 9.9706 - - -
0.4275 233 9.577 - - -
0.4294 234 9.7624 - - -
0.4312 235 9.5083 - - -
0.4330 236 9.5067 - - -
0.4349 237 9.1004 - - -
0.4367 238 8.914 - - -
0.4385 239 9.6852 - - -
0.4404 240 9.573 - - -
0.4422 241 9.8598 - - -
0.4440 242 10.1793 - - -
0.4459 243 10.2789 - - -
0.4477 244 9.9536 - - -
0.4495 245 9.3878 - - -
0.4514 246 9.6734 - - -
0.4532 247 9.3747 - - -
0.4550 248 8.8334 - - -
0.4569 249 9.7495 - - -
0.4587 250 8.8468 - - -
0.4606 251 9.3828 - - -
0.4624 252 9.1118 - - -
0.4642 253 9.3682 - - -
0.4661 254 9.3647 - - -
0.4679 255 9.8533 - - -
0.4697 256 9.2787 - - -
0.4716 257 8.9831 - - -
0.4734 258 9.0524 - - -
0.4752 259 9.5378 - - -
0.4771 260 9.4227 - - -
0.4789 261 9.3545 - - -
0.4807 262 8.8428 - - -
0.4826 263 9.1284 - - -
0.4844 264 8.7769 - - -
0.4862 265 9.0381 - - -
0.4881 266 9.0261 - - -
0.4899 267 8.811 - - -
0.4917 268 9.0848 - - -
0.4936 269 9.0951 - - -
0.4954 270 9.0682 - - -
0.4972 271 9.0418 - - -
0.4991 272 9.7316 - - -
0.5009 273 9.263 - - -
0.5028 274 9.624 - - -
0.5046 275 10.0133 - - -
0.5064 276 9.0789 - - -
0.5083 277 9.1399 - - -
0.5101 278 9.3854 - - -
0.5119 279 8.9982 - - -
0.5138 280 9.1342 - - -
0.5156 281 9.0517 - - -
0.5174 282 9.5637 - - -
0.5193 283 9.5213 - - -
0.5211 284 9.9231 - - -
0.5229 285 10.3441 - - -
0.5248 286 9.6162 - - -
0.5266 287 9.4794 - - -
0.5284 288 9.2728 - - -
0.5303 289 9.411 - - -
0.5321 290 9.5806 - - -
0.5339 291 9.4193 - - -
0.5358 292 9.3528 - - -
0.5376 293 9.7581 - - -
0.5394 294 9.4407 - - -
0.5413 295 9.027 - - -
0.5431 296 9.4272 - - -
0.5450 297 9.2733 - - -
0.5468 298 9.3 - - -
0.5486 299 9.6388 - - -
0.5505 300 9.0698 6.8356 0.8273 -
0.5523 301 9.4613 - - -
0.5541 302 9.9061 - - -
0.5560 303 9.3524 - - -
0.5578 304 9.1935 - - -
0.5596 305 9.1243 - - -
0.5615 306 8.8865 - - -
0.5633 307 9.4411 - - -
0.5651 308 9.1322 - - -
0.5670 309 9.3072 - - -
0.5688 310 8.4299 - - -
0.5706 311 8.9471 - - -
0.5725 312 8.5097 - - -
0.5743 313 9.1158 - - -
0.5761 314 9.0221 - - -
0.5780 315 9.5871 - - -
0.5798 316 9.3789 - - -
0.5817 317 9.1566 - - -
0.5835 318 9.0472 - - -
0.5853 319 8.947 - - -
0.5872 320 9.1791 - - -
0.5890 321 8.8764 - - -
0.5908 322 8.9794 - - -
0.5927 323 9.2044 - - -
0.5945 324 9.0374 - - -
0.5963 325 9.3389 - - -
0.5982 326 9.7387 - - -
0.6 327 9.4248 - - -
0.6018 328 9.4799 - - -
0.6037 329 8.9019 - - -
0.6055 330 9.113 - - -
0.6073 331 9.3148 - - -
0.6092 332 8.9871 - - -
0.6110 333 8.5404 - - -
0.6128 334 9.1587 - - -
0.6147 335 8.9698 - - -
0.6165 336 9.3393 - - -
0.6183 337 9.4845 - - -
0.6202 338 9.6075 - - -
0.6220 339 9.426 - - -
0.6239 340 9.0633 - - -
0.6257 341 9.1017 - - -
0.6275 342 9.2461 - - -
0.6294 343 9.065 - - -
0.6312 344 9.4668 - - -
0.6330 345 9.0267 - - -
0.6349 346 9.2938 - - -
0.6367 347 9.391 - - -
0.6385 348 9.2386 - - -
0.6404 349 9.5285 - - -
0.6422 350 9.5958 - - -
0.6440 351 9.157 - - -
0.6459 352 9.4166 - - -
0.6477 353 9.358 - - -
0.6495 354 9.4497 - - -
0.6514 355 9.407 - - -
0.6532 356 9.1505 - - -
0.6550 357 9.403 - - -
0.6569 358 9.1949 - - -
0.6587 359 8.7922 - - -
0.6606 360 8.883 - - -
0.6624 361 8.6828 - - -
0.6642 362 8.5654 - - -
0.6661 363 8.705 - - -
0.6679 364 8.8329 - - -
0.6697 365 9.1604 - - -
0.6716 366 9.1609 - - -
0.6734 367 9.4693 - - -
0.6752 368 9.1431 - - -
0.6771 369 8.7564 - - -
0.6789 370 9.1378 - - -
0.6807 371 8.8472 - - -
0.6826 372 8.9159 - - -
0.6844 373 8.9551 - - -
0.6862 374 9.2721 - - -
0.6881 375 8.7511 - - -
0.6899 376 9.1683 - - -
0.6917 377 8.8438 - - -
0.6936 378 8.6151 - - -
0.6954 379 8.7015 - - -
0.6972 380 7.6009 - - -
0.6991 381 7.3242 - - -
0.7009 382 7.4182 - - -
0.7028 383 7.2576 - - -
0.7046 384 7.0578 - - -
0.7064 385 6.0212 - - -
0.7083 386 5.9868 - - -
0.7101 387 6.033 - - -
0.7119 388 5.8085 - - -
0.7138 389 5.6002 - - -
0.7156 390 5.439 - - -
0.7174 391 5.1661 - - -
0.7193 392 5.1261 - - -
0.7211 393 5.5393 - - -
0.7229 394 4.8909 - - -
0.7248 395 5.2803 - - -
0.7266 396 5.1639 - - -
0.7284 397 4.7125 - - -
0.7303 398 4.842 - - -
0.7321 399 5.0971 - - -
0.7339 400 4.5101 5.0650 0.8590 -
0.7358 401 4.3422 - - -
0.7376 402 4.719 - - -
0.7394 403 4.1823 - - -
0.7413 404 3.7903 - - -
0.7431 405 3.886 - - -
0.7450 406 4.1115 - - -
0.7468 407 3.9201 - - -
0.7486 408 3.9291 - - -
0.7505 409 4.0412 - - -
0.7523 410 3.6614 - - -
0.7541 411 3.5718 - - -
0.7560 412 3.6689 - - -
0.7578 413 3.7457 - - -
0.7596 414 3.4272 - - -
0.7615 415 3.5112 - - -
0.7633 416 3.8348 - - -
0.7651 417 3.5177 - - -
0.7670 418 3.3441 - - -
0.7688 419 3.362 - - -
0.7706 420 3.4926 - - -
0.7725 421 3.4722 - - -
0.7743 422 2.8568 - - -
0.7761 423 3.3396 - - -
0.7780 424 2.972 - - -
0.7798 425 3.6889 - - -
0.7817 426 3.5154 - - -
0.7835 427 3.4098 - - -
0.7853 428 3.4569 - - -
0.7872 429 3.4916 - - -
0.7890 430 3.7394 - - -
0.7908 431 3.332 - - -
0.7927 432 3.3767 - - -
0.7945 433 3.1173 - - -
0.7963 434 3.2257 - - -
0.7982 435 3.3629 - - -
0.8 436 3.1992 - - -
0.8018 437 3.1252 - - -
0.8037 438 3.5155 - - -
0.8055 439 3.2583 - - -
0.8073 440 2.9001 - - -
0.8092 441 3.1542 - - -
0.8110 442 3.0473 - - -
0.8128 443 3.0446 - - -
0.8147 444 3.3807 - - -
0.8165 445 3.1246 - - -
0.8183 446 3.1922 - - -
0.8202 447 3.09 - - -
0.8220 448 3.4341 - - -
0.8239 449 3.0926 - - -
0.8257 450 2.9746 - - -
0.8275 451 3.1014 - - -
0.8294 452 3.2205 - - -
0.8312 453 3.1147 - - -
0.8330 454 2.9682 - - -
0.8349 455 3.1681 - - -
0.8367 456 2.9821 - - -
0.8385 457 2.8484 - - -
0.8404 458 3.0341 - - -
0.8422 459 3.0632 - - -
0.8440 460 3.2026 - - -
0.8459 461 3.132 - - -
0.8477 462 3.0209 - - -
0.8495 463 2.7183 - - -
0.8514 464 3.0257 - - -
0.8532 465 3.1462 - - -
0.8550 466 2.8747 - - -
0.8569 467 3.0932 - - -
0.8587 468 3.0097 - - -
0.8606 469 3.0956 - - -
0.8624 470 3.019 - - -
0.8642 471 3.1342 - - -
0.8661 472 2.688 - - -
0.8679 473 2.8892 - - -
0.8697 474 3.1589 - - -
0.8716 475 2.9274 - - -
0.8734 476 2.8554 - - -
0.8752 477 2.694 - - -
0.8771 478 2.7397 - - -
0.8789 479 2.6452 - - -
0.8807 480 3.0158 - - -
0.8826 481 3.0148 - - -
0.8844 482 2.5704 - - -
0.8862 483 2.6755 - - -
0.8881 484 2.7805 - - -
0.8899 485 2.8554 - - -
0.8917 486 2.6966 - - -
0.8936 487 2.8759 - - -
0.8954 488 2.8838 - - -
0.8972 489 2.7885 - - -
0.8991 490 2.7576 - - -
0.9009 491 2.9139 - - -
0.9028 492 2.6583 - - -
0.9046 493 2.9654 - - -
0.9064 494 2.551 - - -
0.9083 495 2.5596 - - -
0.9101 496 2.9595 - - -
0.9119 497 2.8677 - - -
0.9138 498 2.5793 - - -
0.9156 499 2.5415 - - -
0.9174 500 2.9738 4.8764 0.8651 -
0.9193 501 2.5838 - - -
0.9211 502 2.6544 - - -
0.9229 503 2.7046 - - -
0.9248 504 2.6339 - - -
0.9266 505 2.687 - - -
0.9284 506 2.7863 - - -
0.9303 507 2.7409 - - -
0.9321 508 2.656 - - -
0.9339 509 2.7456 - - -
0.9358 510 2.6589 - - -
0.9376 511 2.697 - - -
0.9394 512 2.6443 - - -
0.9413 513 2.7357 - - -
0.9431 514 2.969 - - -
0.9450 515 2.4175 - - -
0.9468 516 2.5424 - - -
0.9486 517 2.4773 - - -
0.9505 518 2.6269 - - -
0.9523 519 2.6288 - - -
0.9541 520 2.9471 - - -
0.9560 521 2.9775 - - -
0.9578 522 2.9949 - - -
0.9596 523 2.7084 - - -
0.9615 524 2.6431 - - -
0.9633 525 2.5849 - - -
0.9651 526 7.353 - - -
0.9670 527 9.1463 - - -
0.9688 528 10.9846 - - -
0.9706 529 10.6362 - - -
0.9725 530 10.0763 - - -
0.9743 531 9.7147 - - -
0.9761 532 9.3911 - - -
0.9780 533 9.3722 - - -
0.9798 534 10.794 - - -
0.9817 535 11.661 - - -
0.9835 536 11.4706 - - -
0.9853 537 12.0868 - - -
0.9872 538 12.0017 - - -
0.9890 539 11.7965 - - -
0.9908 540 12.5961 - - -
0.9927 541 9.6563 - - -
0.9945 542 11.5097 - - -
0.9963 543 12.0945 - - -
0.9982 544 10.7032 - - -
1.0 545 10.5622 - - 0.8505

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.48.0.dev0
  • PyTorch: 2.1.0+cu118
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}