Edit model card

SentenceTransformer based on cross-encoder/ms-marco-MiniLM-L-6-v2

This is a sentence-transformers model finetuned from cross-encoder/ms-marco-MiniLM-L-6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: cross-encoder/ms-marco-MiniLM-L-6-v2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Trelis/ms-marco-MiniLM-L-6-v2-2-cst-ep-MNRLtriplets-2e-5-batch32-gpu-overlap")
# Run inference
sentences = [
    'What is the minimum number of digits allowed for identifying numbers according to clause 4.3.1?',
    '2. 2 teams playing unregistered players are liable to forfeit any match in which unregistered players have competed. fit playing rules - 5th edition copyright © touch football australia 2020 5 3 the ball 3. 1 the game is played with an oval, inflated ball of a shape, colour and size approved by fit or the nta. 3. 2 the ball shall be inflated to the manufacturers ’ recommended air pressure. 3. 3 the referee shall immediately pause the match if the size and shape of the ball no longer complies with clauses 3. 1 or 3. 2 to allow for the ball to replaced or the issue rectified. 3. 4 the ball must not be hidden under player attire. 4 playing uniform 4. 1 participating players are to be correctly attired in matching team uniforms 4. 2 playing uniforms consist of shirt, singlet or other item as approved by the nta or nta competition provider, shorts and / or tights and socks. 4. 3 all players are to wear a unique identifying number not less than 16cm in height, clearly displayed on the rear of the playing top. 4. 3. 1 identifying numbers must feature no more than two ( 2 ) digits.',
    '24. 5 for the avoidance of doubt for clauses 24. 3 and 24. 4 the non - offending team will retain a numerical advantage on the field of play during the drop - off. 25 match officials 25. 1 the referee is the sole judge on all match related matters inside the perimeter for the duration of a match, has jurisdiction over all players, coaches and officials and is required to : 25. 1. 1 inspect the field of play, line markings and markers prior to the commencement of the match to ensure the safety of all participants. 25. 1. 2 adjudicate on the rules of the game ; 25. 1. 3 impose any sanction necessary to control the match ; 25. 1. 4 award tries and record the progressive score ; 25. 1. 5 maintain a count of touches during each possession ; 25. 1. 6 award penalties for infringements against the rules ; and 25. 1. 7 report to the relevant competition administration any sin bins, dismissals or injuries to any participant sustained during a match. 25. 2 only team captains are permitted to seek clarification of a decision directly from the referee. an approach may only be made during a break in play or at the discretion of the referee.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 2
  • lr_scheduler_type: constant
  • warmup_ratio: 0.3
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: constant
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.3
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss
0.0066 2 4.4256 -
0.0131 4 4.1504 -
0.0197 6 4.0494 -
0.0262 8 4.0447 -
0.0328 10 3.9851 -
0.0393 12 3.9284 -
0.0459 14 3.9155 -
0.0525 16 3.8791 -
0.0590 18 3.8663 -
0.0656 20 3.9012 -
0.0721 22 3.8999 -
0.0787 24 3.7895 -
0.0852 26 3.7235 -
0.0918 28 3.7938 -
0.0984 30 3.5057 -
0.1049 32 3.5776 -
0.1115 34 3.5092 -
0.1180 36 3.7226 -
0.1246 38 3.5426 -
0.1311 40 3.7318 -
0.1377 42 3.529 -
0.1443 44 3.5977 -
0.1508 46 3.6484 -
0.1574 48 3.5026 -
0.1639 50 3.4568 -
0.1705 52 3.6119 -
0.1770 54 3.4206 -
0.1836 56 3.3701 -
0.1902 58 3.3232 -
0.1967 60 3.3398 -
0.2033 62 3.333 -
0.2098 64 3.3587 -
0.2164 66 3.1304 -
0.2230 68 3.0618 -
0.2295 70 3.145 -
0.2361 72 3.2074 -
0.2426 74 3.0436 -
0.2492 76 3.0572 -
0.2525 77 - 3.0810
0.2557 78 3.1225 -
0.2623 80 2.8197 -
0.2689 82 2.8979 -
0.2754 84 2.7827 -
0.2820 86 2.9472 -
0.2885 88 2.918 -
0.2951 90 2.7035 -
0.3016 92 2.6876 -
0.3082 94 2.8322 -
0.3148 96 2.6335 -
0.3213 98 2.3754 -
0.3279 100 3.0978 -
0.3344 102 2.4946 -
0.3410 104 2.5085 -
0.3475 106 2.7456 -
0.3541 108 2.3934 -
0.3607 110 2.3222 -
0.3672 112 2.4773 -
0.3738 114 2.6684 -
0.3803 116 2.2435 -
0.3869 118 2.243 -
0.3934 120 2.228 -
0.4 122 2.4652 -
0.4066 124 2.2113 -
0.4131 126 2.0805 -
0.4197 128 2.5041 -
0.4262 130 2.4489 -
0.4328 132 2.2474 -
0.4393 134 2.0252 -
0.4459 136 2.257 -
0.4525 138 1.9381 -
0.4590 140 2.0183 -
0.4656 142 2.1021 -
0.4721 144 2.1508 -
0.4787 146 1.9669 -
0.4852 148 1.7468 -
0.4918 150 1.8776 -
0.4984 152 1.8081 -
0.5049 154 1.6799 1.6088
0.5115 156 1.9628 -
0.5180 158 1.8253 -
0.5246 160 1.7791 -
0.5311 162 1.8463 -
0.5377 164 1.6357 -
0.5443 166 1.6531 -
0.5508 168 1.6747 -
0.5574 170 1.5666 -
0.5639 172 1.7272 -
0.5705 174 1.6045 -
0.5770 176 1.3786 -
0.5836 178 1.6547 -
0.5902 180 1.6416 -
0.5967 182 1.4796 -
0.6033 184 1.4595 -
0.6098 186 1.4106 -
0.6164 188 1.4844 -
0.6230 190 1.4581 -
0.6295 192 1.4922 -
0.6361 194 1.2978 -
0.6426 196 1.2612 -
0.6492 198 1.4725 -
0.6557 200 1.3162 -
0.6623 202 1.3736 -
0.6689 204 1.4553 -
0.6754 206 1.4011 -
0.6820 208 1.2523 -
0.6885 210 1.3732 -
0.6951 212 1.3721 -
0.7016 214 1.5262 -
0.7082 216 1.2631 -
0.7148 218 1.6174 -
0.7213 220 1.4252 -
0.7279 222 1.3527 -
0.7344 224 1.1969 -
0.7410 226 1.2901 -
0.7475 228 1.4379 -
0.7541 230 1.1332 -
0.7574 231 - 1.0046
0.7607 232 1.3693 -
0.7672 234 1.3097 -
0.7738 236 1.2314 -
0.7803 238 1.0873 -
0.7869 240 1.2882 -
0.7934 242 1.1723 -
0.8 244 1.1748 -
0.8066 246 1.2916 -
0.8131 248 1.0894 -
0.8197 250 1.2299 -
0.8262 252 1.207 -
0.8328 254 1.1361 -
0.8393 256 1.1323 -
0.8459 258 1.0927 -
0.8525 260 1.1433 -
0.8590 262 1.1088 -
0.8656 264 1.1384 -
0.8721 266 1.0962 -
0.8787 268 1.1878 -
0.8852 270 1.0113 -
0.8918 272 1.1411 -
0.8984 274 1.0289 -
0.9049 276 1.0163 -
0.9115 278 1.2859 -
0.9180 280 0.9449 -
0.9246 282 1.0941 -
0.9311 284 1.0908 -
0.9377 286 1.1028 -
0.9443 288 1.0633 -
0.9508 290 1.1004 -
0.9574 292 1.0483 -
0.9639 294 1.0064 -
0.9705 296 1.0088 -
0.9770 298 1.0068 -
0.9836 300 1.1903 -
0.9902 302 0.9401 -
0.9967 304 0.8369 -
1.0033 306 0.5046 -
1.0098 308 1.0626 0.8660
1.0164 310 0.9587 -
1.0230 312 1.0565 -
1.0295 314 1.1329 -
1.0361 316 1.1857 -
1.0426 318 0.9777 -
1.0492 320 0.9883 -
1.0557 322 0.9076 -
1.0623 324 0.7942 -
1.0689 326 1.1952 -
1.0754 328 0.9726 -
1.0820 330 1.0663 -
1.0885 332 1.0337 -
1.0951 334 0.9522 -
1.1016 336 0.9813 -
1.1082 338 0.9057 -
1.1148 340 1.0076 -
1.1213 342 0.8557 -
1.1279 344 0.9341 -
1.1344 346 0.9188 -
1.1410 348 1.091 -
1.1475 350 0.8205 -
1.1541 352 1.0509 -
1.1607 354 0.9201 -
1.1672 356 1.0741 -
1.1738 358 0.8662 -
1.1803 360 0.9468 -
1.1869 362 0.8604 -
1.1934 364 0.8141 -
1.2 366 0.9475 -
1.2066 368 0.8407 -
1.2131 370 0.764 -
1.2197 372 0.798 -
1.2262 374 0.8205 -
1.2328 376 0.7995 -
1.2393 378 0.9305 -
1.2459 380 0.858 -
1.2525 382 0.8465 -
1.2590 384 0.7691 -
1.2623 385 - 0.7879
1.2656 386 1.0073 -
1.2721 388 0.8026 -
1.2787 390 0.8108 -
1.2852 392 0.7783 -
1.2918 394 0.8766 -
1.2984 396 0.8576 -
1.3049 398 0.884 -
1.3115 400 0.9547 -
1.3180 402 0.9231 -
1.3246 404 0.8027 -
1.3311 406 0.9117 -
1.3377 408 0.7743 -
1.3443 410 0.8257 -
1.3508 412 0.8738 -
1.3574 414 0.972 -
1.3639 416 0.8297 -
1.3705 418 0.8941 -
1.3770 420 0.8513 -
1.3836 422 0.7588 -
1.3902 424 0.8332 -
1.3967 426 0.7682 -
1.4033 428 0.7916 -
1.4098 430 0.9519 -
1.4164 432 1.0526 -
1.4230 434 0.8724 -
1.4295 436 0.8267 -
1.4361 438 0.7672 -
1.4426 440 0.7977 -
1.4492 442 0.6947 -
1.4557 444 0.9042 -
1.4623 446 0.8971 -
1.4689 448 0.9655 -
1.4754 450 0.8512 -
1.4820 452 0.9421 -
1.4885 454 0.9501 -
1.4951 456 0.8214 -
1.5016 458 0.9335 -
1.5082 460 0.7617 -
1.5148 462 0.8601 0.7855
1.5213 464 0.757 -
1.5279 466 0.7389 -
1.5344 468 0.8146 -
1.5410 470 0.9235 -
1.5475 472 0.9901 -
1.5541 474 0.9624 -
1.5607 476 0.8909 -
1.5672 478 0.7276 -
1.5738 480 0.9444 -
1.5803 482 0.874 -
1.5869 484 0.7985 -
1.5934 486 0.9335 -
1.6 488 0.8108 -
1.6066 490 0.7779 -
1.6131 492 0.8807 -
1.6197 494 0.8146 -
1.6262 496 0.9218 -
1.6328 498 0.8439 -
1.6393 500 0.7348 -
1.6459 502 0.8533 -
1.6525 504 0.7695 -
1.6590 506 0.7911 -
1.6656 508 0.837 -
1.6721 510 0.731 -
1.6787 512 0.911 -
1.6852 514 0.7963 -
1.6918 516 0.7719 -
1.6984 518 0.8011 -
1.7049 520 0.7428 -
1.7115 522 0.8159 -
1.7180 524 0.7833 -
1.7246 526 0.7934 -
1.7311 528 0.7854 -
1.7377 530 0.8398 -
1.7443 532 0.7875 -
1.7508 534 0.7282 -
1.7574 536 0.8269 -
1.7639 538 0.8033 -
1.7672 539 - 0.7595
1.7705 540 0.9471 -
1.7770 542 0.941 -
1.7836 544 0.725 -
1.7902 546 0.8978 -
1.7967 548 0.8361 -
1.8033 550 0.7092 -
1.8098 552 0.809 -
1.8164 554 0.9399 -
1.8230 556 0.769 -
1.8295 558 0.7381 -
1.8361 560 0.7554 -
1.8426 562 0.8553 -
1.8492 564 0.919 -
1.8557 566 0.7479 -
1.8623 568 0.8381 -
1.8689 570 0.7911 -
1.8754 572 0.8076 -
1.8820 574 0.7868 -
1.8885 576 0.9147 -
1.8951 578 0.7271 -
1.9016 580 0.7201 -
1.9082 582 0.7538 -
1.9148 584 0.7522 -
1.9213 586 0.7737 -
1.9279 588 0.7187 -
1.9344 590 0.8713 -
1.9410 592 0.7971 -
1.9475 594 0.8226 -
1.9541 596 0.7074 -
1.9607 598 0.804 -
1.9672 600 0.7259 -
1.9738 602 0.7758 -
1.9803 604 0.8209 -
1.9869 606 0.7918 -
1.9934 608 0.7467 -
2.0 610 0.4324 -

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.3
  • PyTorch: 2.1.1+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.17.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
0
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for Trelis/ms-marco-MiniLM-L-6-v2-2-cst-ep-MNRLtriplets-2e-5-batch32-gpu-overlap

Finetuned
this model