Edit model card

SentenceTransformer based on sentence-transformers/multi-qa-mpnet-base-dot-v1

This is a sentence-transformers model finetuned from sentence-transformers/multi-qa-mpnet-base-dot-v1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'nerve cell dysfunction, riboflavin deficiency',
    'Riboflavin transporter deficiency neuronopathy is a disorder that affects nerve cells (neurons). Affected individuals typically have hearing loss caused by nerve damage in the inner ear (sensorineural hearing loss) and signs of damage to other nerves.',
    'A number sign (#) is used with this entry because hyperprolinemia type I (HYRPRO1) is caused by homozygous or compound heterozygous mutation in the proline dehydrogenase gene (PRODH; 606810) on chromosome 22q11.\n\nThe PRODH gene falls within the region deleted in the 22q11 deletion syndrome, including DiGeorge syndrome (188400) and velocardiofacial syndrome (192430).\n\nDescription\n\nPhang et al. (2001) noted that prospective studies of HPI probands identified through newborn screening as well as reports of several families have suggested that it is a metabolic disorder not clearly associated with clinical manifestations. Phang et al. (2001) concluded that HPI is a relatively benign condition in most individuals under most circumstances. However, other reports have suggested that some patients have a severe phenotype with neurologic manifestations, including epilepsy and mental retardation (Jacquet et al., 2003).\n\n### Genetic Heterogeneity of Hyperprolinemia',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.1933
cosine_accuracy@3 0.5626
cosine_accuracy@5 0.7512
cosine_accuracy@10 0.841
cosine_precision@1 0.1933
cosine_precision@3 0.1875
cosine_precision@5 0.1502
cosine_precision@10 0.0841
cosine_recall@1 0.1933
cosine_recall@3 0.5626
cosine_recall@5 0.7512
cosine_recall@10 0.841
cosine_ndcg@10 0.512
cosine_mrr@10 0.4059
cosine_map@100 0.411
dot_accuracy@1 0.1949
dot_accuracy@3 0.5673
dot_accuracy@5 0.7571
dot_accuracy@10 0.8415
dot_precision@1 0.1949
dot_precision@3 0.1891
dot_precision@5 0.1514
dot_precision@10 0.0842
dot_recall@1 0.1949
dot_recall@3 0.5673
dot_recall@5 0.7571
dot_recall@10 0.8415
dot_ndcg@10 0.5141
dot_mrr@10 0.4084
dot_map@100 0.4136

Training Details

Training Dataset

Unnamed Dataset

  • Size: 95,159 training samples
  • Columns: queries and chunks
  • Approximate statistics based on the first 1000 samples:
    queries chunks
    type string string
    details
    • min: 5 tokens
    • mean: 15.01 tokens
    • max: 30 tokens
    • min: 5 tokens
    • mean: 158.91 tokens
    • max: 319 tokens
  • Samples:
    queries chunks
    hypotrichosis, wiry hair, onycholysis Green et al. (2003) reported an Australian family in which 22 members over 4 generations had progressive patterned scalp hypotrichosis and wiry hair similar to that seen in Marie Unna hereditary hypotrichosis (MUHH; 146550). Features differing from those of MUHH included absence of signs of abnormality at birth, relative sparing of body hair, distal onycholysis, and intermittent cosegregation with autosomal dominant cleft lip and palate. Five individuals had associated cleft lip and palate. Green et al. (2003) excluded linkage of the disorder in the Australian family to the MUHH locus on chromosome 8p21.
    cleft lip, cleft palate, hair loss Green et al. (2003) reported an Australian family in which 22 members over 4 generations had progressive patterned scalp hypotrichosis and wiry hair similar to that seen in Marie Unna hereditary hypotrichosis (MUHH; 146550). Features differing from those of MUHH included absence of signs of abnormality at birth, relative sparing of body hair, distal onycholysis, and intermittent cosegregation with autosomal dominant cleft lip and palate. Five individuals had associated cleft lip and palate. Green et al. (2003) excluded linkage of the disorder in the Australian family to the MUHH locus on chromosome 8p21.
    progressive patterned scalp, autosomal dominant inheritance Green et al. (2003) reported an Australian family in which 22 members over 4 generations had progressive patterned scalp hypotrichosis and wiry hair similar to that seen in Marie Unna hereditary hypotrichosis (MUHH; 146550). Features differing from those of MUHH included absence of signs of abnormality at birth, relative sparing of body hair, distal onycholysis, and intermittent cosegregation with autosomal dominant cleft lip and palate. Five individuals had associated cleft lip and palate. Green et al. (2003) excluded linkage of the disorder in the Australian family to the MUHH locus on chromosome 8p21.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 1,
        "similarity_fct": "dot_score"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 8,747 evaluation samples
  • Columns: queries and chunks
  • Approximate statistics based on the first 1000 samples:
    queries chunks
    type string string
    details
    • min: 6 tokens
    • mean: 14.71 tokens
    • max: 31 tokens
    • min: 4 tokens
    • mean: 155.81 tokens
    • max: 305 tokens
  • Samples:
    queries chunks
    white patches, corrugated tongue, immunocompromised, Epstein-Barr virus Not to be confused with Hairy tongue.

    Hairy leukoplakia
    Other namesOral hairy leukoplakia,[1]:385 OHL, or HIV-associated hairy leukoplakia[2]
    SpecialtyGastroenterology

    Hairy leukoplakia is a white patch on the side of the tongue with a corrugated or hairy appearance. It is caused by Epstein-Barr virus (EBV) and occurs usually in persons who are immunocompromised, especially those with human immunodeficiency virus infection/acquired immunodeficiency syndrome (HIV/AIDS). The white lesion, which cannot be scraped off, is benign and does not require any treatment, although its appearance may have diagnostic and prognostic implications for the underlying condition.

    Depending upon what definition of leukoplakia is used, hairy leukoplakia is sometimes considered a subtype of leukoplakia, or a distinct diagnosis.

    ## Contents
    HIV-associated lesions, oral hairy leukoplakia, benign white lesions, tongue appearance Not to be confused with Hairy tongue.

    Hairy leukoplakia
    Other namesOral hairy leukoplakia,[1]:385 OHL, or HIV-associated hairy leukoplakia[2]
    SpecialtyGastroenterology

    Hairy leukoplakia is a white patch on the side of the tongue with a corrugated or hairy appearance. It is caused by Epstein-Barr virus (EBV) and occurs usually in persons who are immunocompromised, especially those with human immunodeficiency virus infection/acquired immunodeficiency syndrome (HIV/AIDS). The white lesion, which cannot be scraped off, is benign and does not require any treatment, although its appearance may have diagnostic and prognostic implications for the underlying condition.

    Depending upon what definition of leukoplakia is used, hairy leukoplakia is sometimes considered a subtype of leukoplakia, or a distinct diagnosis.

    ## Contents
    hairy leukoplakia symptoms, non-scrapable lesions, HIV/AIDS, oral lesions Not to be confused with Hairy tongue.

    Hairy leukoplakia
    Other namesOral hairy leukoplakia,[1]:385 OHL, or HIV-associated hairy leukoplakia[2]
    SpecialtyGastroenterology

    Hairy leukoplakia is a white patch on the side of the tongue with a corrugated or hairy appearance. It is caused by Epstein-Barr virus (EBV) and occurs usually in persons who are immunocompromised, especially those with human immunodeficiency virus infection/acquired immunodeficiency syndrome (HIV/AIDS). The white lesion, which cannot be scraped off, is benign and does not require any treatment, although its appearance may have diagnostic and prognostic implications for the underlying condition.

    Depending upon what definition of leukoplakia is used, hairy leukoplakia is sometimes considered a subtype of leukoplakia, or a distinct diagnosis.

    ## Contents
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 1,
        "similarity_fct": "dot_score"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 2e-05
  • num_train_epochs: 15
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True
  • eval_on_start: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: True
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss dot_map@100
0 0 - 1.4355 0.2271
0.1346 100 1.2599 - -
0.2692 200 0.7627 - -
0.4038 300 0.6061 - -
0.5384 400 0.5632 - -
0.6729 500 0.3965 0.4589 0.3852
0.8075 600 0.3104 - -
0.9421 700 0.446 - -
1.0767 800 0.4426 - -
1.2113 900 0.4518 - -
1.3459 1000 0.4145 0.3726 0.3964
1.4805 1100 0.4296 - -
1.6151 1200 0.4144 - -
1.7497 1300 0.1536 - -
1.8843 1400 0.3425 - -
2.0188 1500 0.3225 0.3433 0.3930
2.1534 1600 0.3529 - -
2.2880 1700 0.3382 - -
2.4226 1800 0.3092 - -
2.5572 1900 0.339 - -
2.6918 2000 0.1681 0.3633 0.4032
2.8264 2100 0.1753 - -
2.9610 2200 0.2552 - -
3.0956 2300 0.2549 - -
3.2301 2400 0.2759 - -
3.3647 2500 0.2513 0.3338 0.4066
3.4993 2600 0.258 - -
3.6339 2700 0.2222 - -
3.7685 2800 0.0541 - -
3.9031 2900 0.2275 - -
4.0377 3000 0.1919 0.3529 0.4026
4.1723 3100 0.215 - -
4.3069 3200 0.2114 - -
4.4415 3300 0.2153 - -
4.5760 3400 0.2164 - -
4.7106 3500 0.0773 0.3509 0.4090
4.8452 3600 0.1211 - -
4.9798 3700 0.1553 - -
5.1144 3800 0.1764 - -
5.2490 3900 0.1953 - -
5.3836 4000 0.1559 0.3474 0.4089
5.5182 4100 0.1686 - -
5.6528 4200 0.1327 - -
5.7873 4300 0.0514 - -
5.9219 4400 0.1381 - -
6.0565 4500 0.1445 0.3521 0.4056
6.1911 4600 0.1621 - -
6.3257 4700 0.1365 - -
6.4603 4800 0.1579 - -
6.5949 4900 0.1547 - -
6.7295 5000 0.0316 0.3895 0.4094
6.8641 5100 0.0958 - -
6.9987 5200 0.1082 - -
7.1332 5300 0.1379 - -
7.2678 5400 0.1348 - -
7.4024 5500 0.1322 0.3552 0.4100
7.5370 5600 0.1321 - -
7.6716 5700 0.0763 - -
7.8062 5800 0.0472 - -
7.9408 5900 0.0989 - -
8.0754 6000 0.1045 0.3631 0.3967
8.2100 6100 0.122 - -
8.3445 6200 0.1057 - -
8.4791 6300 0.1194 - -
8.6137 6400 0.113 - -
8.7483 6500 0.0126 0.3944 0.4116
8.8829 6600 0.089 - -
9.0175 6700 0.0849 - -
9.1521 6800 0.1052 - -
9.2867 6900 0.111 - -
9.4213 7000 0.1026 0.3665 0.4133
9.5559 7100 0.1165 - -
9.6904 7200 0.0394 - -
9.8250 7300 0.0443 - -
9.9596 7400 0.0756 - -
10.0942 7500 0.0806 0.3785 0.4090
10.2288 7600 0.103 - -
10.3634 7700 0.0875 - -
10.4980 7800 0.0959 - -
10.6326 7900 0.0851 - -
10.7672 8000 0.0073 0.3902 0.4136
10.9017 8100 0.079 - -
11.0363 8200 0.0664 - -
11.1709 8300 0.0766 - -
11.3055 8400 0.084 - -
11.4401 8500 0.0947 0.3733 0.4099
11.5747 8600 0.0906 - -
11.7093 8700 0.0224 - -
11.8439 8800 0.0424 - -
11.9785 8900 0.0569 - -
12.1131 9000 0.0697 0.3824 0.4071
12.2476 9100 0.095 - -
12.3822 9200 0.0651 - -
12.5168 9300 0.0756 - -
12.6514 9400 0.065 - -
12.7860 9500 0.0194 0.3876 0.4110
12.9206 9600 0.0595 - -
13.0552 9700 0.0629 - -
13.1898 9800 0.0808 - -
13.3244 9900 0.0652 - -
13.4590 10000 0.0802 0.3783 0.4091
13.5935 10100 0.0809 - -
13.7281 10200 0.0111 - -
13.8627 10300 0.0465 - -
13.9973 10400 0.0504 - -
14.1319 10500 0.068 0.3831 0.4071
14.2665 10600 0.0739 - -
14.4011 10700 0.0734 - -
14.5357 10800 0.0737 - -
14.6703 10900 0.0379 - -
14.8048 11000 0.0231 0.3841 0.4112
14.9394 11100 0.0493 - -
15.0 11145 - 0.3902 0.4136
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.9
  • Sentence Transformers: 3.0.1
  • Transformers: 4.43.3
  • PyTorch: 2.3.1+cu121
  • Accelerate: 0.30.1
  • Datasets: 2.19.2
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for antonkirk/retrieval-mpnet-dot-finetuned-combined-synthetic-dataset

Finetuned
(8)
this model

Evaluation results