mertcobanov's picture
Add new SentenceTransformer model
e1e0c18 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:13842
  - loss:MultipleNegativesRankingLoss
base_model: microsoft/mpnet-base
widget:
  - source_sentence: >-
      Bir adam bir elinde kahve fincanı, diğer elinde tuvalet fırçası ile
      tuvaletin önünde duruyor.
    sentences:
      - Şef ve orkestra oturmuyor.
      - Bir adam bir banyoda duruyor.
      - Bir adam kahve demlemeye çalışıyor.
  - source_sentence: Sarı ceketli ve siyah pantolonlu iki adam madalyalara sahip.
    sentences:
      - Erkeklere bir noktada bir ödül verilmiştir.
      - >-
        Başlangıçtaki net ölçek faydası, ücret primleri olsun ya da olmasın,
        pozitiftir.
      - Adamlar düz kırmızı ceketler ve mavi pantolonlar giymiş.
  - source_sentence: >-
      Restoran zinciri içi: Planet Hollywood, çeşitli film hatıraları mekânı
      süslüyor.
    sentences:
      - Kadın bir şey tutuyor.
      - Bir restoranın içi.
      - Yeni gümüş makinelerin bulunduğu bir çamaşırhane içi.
  - source_sentence: İki çocuk, binanın yakınındaki kaldırımda sokakta koşuyor.
    sentences:
      - Çocuklar dışarıda.
      - Bazı odaların dışına balkonları vardır.
      - Çocuklar içeride.
  - source_sentence: Ağaçlarla çevrili bulvar denize üç bloktan daha az uzanıyor.
    sentences:
      - Deniz üç sokak bile uzakta değil.
      - Çocuk başını duvardaki bir delikten geçiriyor.
      - Denize ulaşmak için caddeden iki mil yol almanız gerekiyor.
datasets:
  - mertcobanov/all-nli-triplets-turkish
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy
model-index:
  - name: MPNet base trained on AllNLI-turkish triplets
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: all nli dev turkish
          type: all-nli-dev-turkish
        metrics:
          - type: cosine_accuracy
            value: 0.7422539489671932
            name: Cosine Accuracy
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: all nli test turkish
          type: all-nli-test-turkish
        metrics:
          - type: cosine_accuracy
            value: 0.7503404448479346
            name: Cosine Accuracy

MPNet base trained on AllNLI-turkish triplets

This is a sentence-transformers model finetuned from microsoft/mpnet-base on the all-nli-triplets-turkish dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: microsoft/mpnet-base
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("mertcobanov/mpnet-base-all-nli-triplet-turkish-v3")
# Run inference
sentences = [
    'Ağaçlarla çevrili bulvar denize üç bloktan daha az uzanıyor.',
    'Deniz üç sokak bile uzakta değil.',
    'Denize ulaşmak için caddeden iki mil yol almanız gerekiyor.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

  • Datasets: all-nli-dev-turkish and all-nli-test-turkish
  • Evaluated with TripletEvaluator
Metric all-nli-dev-turkish all-nli-test-turkish
cosine_accuracy 0.7423 0.7503

Training Details

Training Dataset

all-nli-triplets-turkish

  • Dataset: all-nli-triplets-turkish at bff203b
  • Size: 13,842 training samples
  • Columns: anchor_translated, positive_translated, and negative_translated
  • Approximate statistics based on the first 1000 samples:
    anchor_translated positive_translated negative_translated
    type string string string
    details
    • min: 8 tokens
    • mean: 13.42 tokens
    • max: 95 tokens
    • min: 8 tokens
    • mean: 31.64 tokens
    • max: 93 tokens
    • min: 6 tokens
    • mean: 32.03 tokens
    • max: 89 tokens
  • Samples:
    anchor_translated positive_translated negative_translated
    Asyalı okul çocukları birbirlerinin omuzlarında oturuyor. Okul çocukları bir arada Asyalı fabrika işçileri oturuyor.
    İnsanlar dışarıda. Arka planda resmi kıyafetler giymiş bir grup insan var ve beyaz gömlekli, haki pantolonlu bir adam toprak yoldan yeşil çimenlere atlıyor. Bir odada üç kişiyle birlikte büyük bir kamera tutan bir adam.
    Bir adam dışarıda. Adam yarış sırasında yan sepetten bir su birikintisine düşer. Beyaz bir sarık sarmış gömleksiz bir adam bir ağaç gövdesine tırmanıyor.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

all-nli-triplets-turkish

  • Dataset: all-nli-triplets-turkish at bff203b
  • Size: 6,584 evaluation samples
  • Columns: anchor_translated, positive_translated, and negative_translated
  • Approximate statistics based on the first 1000 samples:
    anchor_translated positive_translated negative_translated
    type string string string
    details
    • min: 5 tokens
    • mean: 42.62 tokens
    • max: 192 tokens
    • min: 5 tokens
    • mean: 22.58 tokens
    • max: 77 tokens
    • min: 5 tokens
    • mean: 22.07 tokens
    • max: 65 tokens
  • Samples:
    anchor_translated positive_translated negative_translated
    Ayrıca, bu özel tüketim vergileri, diğer vergiler gibi, hükümetin ödeme zorunluluğunu sağlama yetkisini kullanarak belirlenir. Hükümetin ödeme zorlaması, özel tüketim vergilerinin nasıl hesaplandığını belirler. Özel tüketim vergileri genel kuralın bir istisnasıdır ve aslında GSYİH payına dayalı olarak belirlenir.
    Gri bir sweatshirt giymiş bir sanatçı, canlı renklerde bir kasaba tablosu üzerinde çalışıyor. Bir ressam gri giysiler içinde bir kasabanın resmini yapıyor. Bir kişi bir beyzbol sopası tutuyor ve gelen bir atış için planda bekliyor.
    İmkansız. Yapılamaz. Tamamen mümkün.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • warmup_ratio: 0.1
  • fp16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss all-nli-dev-turkish_cosine_accuracy all-nli-test-turkish_cosine_accuracy
0 0 - - 0.6092 -
0.1155 100 3.3654 2.9084 0.6624 -
0.2309 200 2.6321 1.7277 0.7395 -
0.3464 300 1.9629 1.5000 0.7512 -
0.4619 400 1.6662 1.4965 0.7494 -
0.5774 500 1.4712 1.5374 0.7418 -
0.6928 600 1.0429 1.6301 0.7360 -
0.8083 700 0.8995 2.1626 0.7044 -
0.9238 800 0.7269 2.0440 0.6996 -
1.0381 900 1.0584 1.6714 0.7438 -
1.1536 1000 1.1864 1.5326 0.7495 -
1.2691 1100 1.0193 1.4498 0.7518 -
1.3845 1200 0.8237 1.5399 0.7506 -
1.5 1300 0.8279 1.6747 0.7521 -
1.6155 1400 0.626 1.5776 0.7453 -
1.7309 1500 0.5396 1.8877 0.7139 -
1.8464 1600 0.4294 2.2258 0.6947 -
1.9619 1700 0.4988 1.8753 0.7204 -
2.0762 1800 0.6987 1.5408 0.7524 -
2.1917 1900 0.6684 1.4434 0.7618 -
2.3072 2000 0.6072 1.4840 0.7520 -
2.4226 2100 0.5081 1.5225 0.7561 -
2.5381 2200 0.5216 1.5280 0.7514 -
2.6536 2300 0.2627 1.8830 0.7227 -
2.7691 2400 0.2585 1.9529 0.7221 -
2.8845 2500 0.129 2.2323 0.7047 -
3.0 2600 0.1698 2.2904 0.7063 -
3.1143 2700 0.5559 1.6110 0.7553 -
3.2298 2800 0.4356 1.5544 0.7508 -
3.3453 2900 0.3886 1.5437 0.7539 -
3.4607 3000 0.3573 1.6262 0.7539 -
3.5762 3100 0.2652 1.8391 0.7321 -
3.6917 3200 0.0765 2.0359 0.7186 -
3.8072 3300 0.0871 2.0946 0.7262 -
3.9226 3400 0.0586 2.2168 0.7093 -
4.0370 3500 0.1755 1.7567 0.7462 -
4.1524 3600 0.3397 1.7735 0.7442 -
4.2679 3700 0.3067 1.7475 0.7497 -
4.3834 3800 0.246 1.7075 0.7476 -
4.4988 3900 0.253 1.7648 0.7483 -
4.6143 4000 0.1223 1.9139 0.7246 -
4.7298 4100 0.0453 2.1138 0.7152 -
4.8453 4200 0.0241 2.2354 0.7240 -
4.9607 4300 0.0363 2.3080 0.7251 -
5.0751 4400 0.1897 1.7394 0.7494 -
5.1905 4500 0.2114 1.6929 0.7524 -
5.3060 4600 0.2101 1.7402 0.7556 -
5.4215 4700 0.1471 1.7990 0.7445 -
5.5370 4800 0.1783 1.8060 0.7456 -
5.6524 4900 0.0215 2.0118 0.7325 -
5.7679 5000 0.0083 2.0766 0.7265 -
5.8834 5100 0.0138 2.2054 0.7201 -
5.9988 5200 0.0144 2.1667 0.7164 -
6.1132 5300 0.2023 1.7309 0.7543 -
6.2286 5400 0.1356 1.6685 0.7622 -
6.3441 5500 0.1307 1.7292 0.7527 -
6.4596 5600 0.1222 1.8403 0.7435 -
6.5751 5700 0.1049 1.8456 0.7394 -
6.6905 5800 0.0051 1.9898 0.7362 -
6.8060 5900 0.0131 2.0532 0.7310 -
6.9215 6000 0.0132 2.2237 0.7186 -
7.0358 6100 0.0453 1.8965 0.7397 -
7.1513 6200 0.1109 1.7195 0.7550 -
7.2667 6300 0.1002 1.7547 0.7530 -
7.3822 6400 0.0768 1.7701 0.7433 -
7.4977 6500 0.0907 1.8472 0.7406 -
7.6132 6600 0.038 1.9162 0.7377 -
7.7286 6700 0.0151 1.9407 0.7312 -
7.8441 6800 0.0087 1.9657 0.7289 -
7.9596 6900 0.0104 2.0302 0.7227 -
8.0739 7000 0.0727 1.8692 0.7514 -
8.1894 7100 0.0733 1.8039 0.7520 -
8.3048 7200 0.0728 1.7400 0.7539 -
8.4203 7300 0.0537 1.8062 0.7461 -
8.5358 7400 0.059 1.8469 0.7489 -
8.6513 7500 0.0089 1.9033 0.7403 -
8.7667 7600 0.0034 1.9683 0.7354 -
8.8822 7700 0.0018 2.0075 0.7366 -
8.9977 7800 0.0023 2.0646 0.7322 -
9.1120 7900 0.0642 1.9063 0.7430 -
9.2275 8000 0.0596 1.8492 0.7468 -
9.3430 8100 0.0479 1.8180 0.7517 -
9.4584 8200 0.0561 1.8122 0.7468 -
9.5739 8300 0.0311 1.8528 0.7456 -
9.6894 8400 0.0069 1.8778 0.7447 -
9.8048 8500 0.0027 1.8989 0.7423 -
9.9203 8600 0.0093 1.9089 0.7423 -
9.9896 8660 - - - 0.7503

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.3.1
  • Transformers: 4.46.3
  • PyTorch: 2.3.0
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}