Edit model card

SentenceTransformer based on sentence-transformers/stsb-distilbert-base

This is a sentence-transformers model finetuned from sentence-transformers/stsb-distilbert-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: DistilBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("alpha-brain/stsb-distilbert-base-mnrl")
# Run inference
sentences = [
    'Do correlations between plasma-neuropeptides and temperament dimensions differ between suicidal patients and healthy controls?',
    'Decreased plasma levels of plasma-neuropeptide Y (NPY) and plasma-corticotropin releasing hormone (CRH), and increased levels of plasma delta-sleep inducing peptide (DSIP) in suicide attempters with mood disorders have previously been observed. This study was performed in order to further understand the clinical relevance of these findings.',
    "Seven hundred fifty patients entered the study. One hundred sixty-eight patients (22.4%) presented with a total of 193 extracutaneous manifestations, as follows: articular (47.2%), neurologic (17.1%), vascular (9.3%), ocular (8.3%), gastrointestinal (6.2%), respiratory (2.6%), cardiac (1%), and renal (1%). Other autoimmune conditions were present in 7.3% of patients. Neurologic involvement consisted of epilepsy, central nervous system vasculitis, peripheral neuropathy, vascular malformations, headache, and neuroimaging abnormalities. Ocular manifestations were episcleritis, uveitis, xerophthalmia, glaucoma, and papilledema. In more than one-fourth of these children, articular, neurologic, and ocular involvements were unrelated to the site of skin lesions. Raynaud's phenomenon was reported in 16 patients. Respiratory involvement consisted essentially of restrictive lung disease. Gastrointestinal involvement was reported in 12 patients and consisted exclusively of gastroesophageal reflux. Thirty patients (4%) had multiple extracutaneous features, but systemic sclerosis (SSc) developed in only 1 patient. In patients with extracutaneous involvement, the prevalence of antinuclear antibodies and rheumatoid factor was significantly higher than that among patients with only skin involvement. However, Scl-70 and anticentromere, markers of SSc, were not significantly increased.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.9825
cosine_accuracy@3 0.998
cosine_accuracy@5 0.9985
cosine_accuracy@10 0.9985
cosine_precision@1 0.9825
cosine_precision@3 0.8438
cosine_precision@5 0.5588
cosine_precision@10 0.2931
cosine_recall@1 0.3413
cosine_recall@3 0.8454
cosine_recall@5 0.9192
cosine_recall@10 0.9578
cosine_ndcg@10 0.9462
cosine_mrr@10 0.99
cosine_map@100 0.9169
dot_accuracy@1 0.9705
dot_accuracy@3 0.9955
dot_accuracy@5 0.9985
dot_accuracy@10 0.999
dot_precision@1 0.9705
dot_precision@3 0.8142
dot_precision@5 0.546
dot_precision@10 0.2899
dot_recall@1 0.3366
dot_recall@3 0.8156
dot_recall@5 0.8994
dot_recall@10 0.9481
dot_ndcg@10 0.9297
dot_mrr@10 0.9828
dot_map@100 0.8927

Training Details

Training Dataset

Unnamed Dataset

  • Size: 622,302 training samples
  • Columns: question and contexts
  • Approximate statistics based on the first 1000 samples:
    question contexts
    type string string
    details
    • min: 9 tokens
    • mean: 27.35 tokens
    • max: 60 tokens
    • min: 5 tokens
    • mean: 88.52 tokens
    • max: 128 tokens
  • Samples:
    question contexts
    Does low-level human equivalent gestational lead exposure produce sex-specific motor and coordination abnormalities and late-onset obesity in year-old mice? Low-level developmental lead exposure is linked to cognitive and neurological disorders in children. However, the long-term effects of gestational lead exposure (GLE) have received little attention.
    Does insulin in combination with selenium inhibit HG/Pal-induced cardiomyocyte apoptosis by Cbl-b regulating p38MAPK/CBP/Ku70 pathway? In this study, we investigated whether insulin and selenium in combination (In/Se) suppresses cardiomyocyte apoptosis and whether this protection is mediated by Cbl-b regulating p38MAPK/CBP/Ku70 pathway.
    Does arthroscopic subacromial decompression result in normal shoulder function after two years in less than 50 % of patients? The aim of this study was to evaluate the outcome two years after arthroscopic subacromial decompression using the Western Ontario Rotator-Cuff (WORC) index and a diagram-based questionnaire to self-assess active shoulder range of motion (ROM).
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 32,753 evaluation samples
  • Columns: question and contexts
  • Approximate statistics based on the first 1000 samples:
    question contexts
    type string string
    details
    • min: 11 tokens
    • mean: 27.52 tokens
    • max: 56 tokens
    • min: 3 tokens
    • mean: 88.59 tokens
    • max: 128 tokens
  • Samples:
    question contexts
    Does [ Chemical components from essential oil of Pandanus amaryllifolius leave ]? The essential oil of Pandanus amaryllifolius leaves was analyzed by gas chromatography-mass spectrum, and the relative content of each component was determined by area normalization method.
    Is elevated C-reactive protein associated with the tumor depth of invasion but not with disease recurrence in stage II and III colorectal cancer? We previously demonstrated that elevated serum C-reactive protein (CRP) level is associated with depth of tumor invasion in operable colorectal cancer. There is also increasing evidence to show that raised CRP concentration is associated with poor survival in patients with colorectal cancer. The purpose of this study was to investigate the correlation between preoperative CRP concentrations and short-term disease recurrence in cases with stage II and III colorectal cancer.
    Do neuropeptide Y and peptide YY protect from weight loss caused by Bacille Calmette-Guérin in mice? Deletion of PYY and NPY aggravated the BCG-induced loss of body weight, which was most pronounced in NPY-/-;PYY-/- mice (maximum loss: 15%). The weight loss in NPY-/-;PYY-/- mice did not normalize during the 2 week observation period. BCG suppressed the circadian pattern of locomotion, exploration and food intake. However, these changes took a different time course than the prolonged weight loss caused by BCG in NPY-/-;PYY-/- mice. The effect of BCG to increase circulating IL-6 (measured 16 days post-treatment) remained unaltered by knockout of PYY, NPY or NPY plus PYY.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 64
  • num_train_epochs: 1

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss med-eval-dev_cosine_map@100
0 0 - - 0.3328
0.0103 100 0.7953 - -
0.0206 200 0.5536 - -
0.0257 250 - 0.1041 0.7474
0.0309 300 0.4755 - -
0.0411 400 0.4464 - -
0.0514 500 0.3986 0.0761 0.7786
0.0617 600 0.357 - -
0.0720 700 0.3519 - -
0.0771 750 - 0.0685 0.8029
0.0823 800 0.3197 - -
0.0926 900 0.3247 - -
0.1028 1000 0.3048 0.0549 0.8108
0.1131 1100 0.2904 - -
0.1234 1200 0.281 - -
0.1285 1250 - 0.0503 0.8181
0.1337 1300 0.2673 - -
0.1440 1400 0.2645 - -
0.1543 1500 0.2511 0.0457 0.8332
0.1645 1600 0.2541 - -
0.1748 1700 0.2614 - -
0.1800 1750 - 0.0401 0.8380
0.1851 1800 0.2263 - -
0.1954 1900 0.2466 - -
0.2057 2000 0.2297 0.0365 0.8421
0.2160 2100 0.2225 - -
0.2262 2200 0.212 - -
0.2314 2250 - 0.0344 0.8563
0.2365 2300 0.2257 - -
0.2468 2400 0.1953 - -
0.2571 2500 0.1961 0.0348 0.8578
0.2674 2600 0.1888 - -
0.2777 2700 0.2039 - -
0.2828 2750 - 0.0319 0.8610
0.2879 2800 0.1939 - -
0.2982 2900 0.202 - -
0.3085 3000 0.1915 0.0292 0.8678
0.3188 3100 0.1987 - -
0.3291 3200 0.1877 - -
0.3342 3250 - 0.0275 0.8701
0.3394 3300 0.1874 - -
0.3497 3400 0.1689 - -
0.3599 3500 0.169 0.0281 0.8789
0.3702 3600 0.1631 - -
0.3805 3700 0.1611 - -
0.3856 3750 - 0.0263 0.8814
0.3908 3800 0.1764 - -
0.4011 3900 0.1796 - -
0.4114 4000 0.1729 0.0249 0.8805
0.4216 4100 0.1551 - -
0.4319 4200 0.1543 - -
0.4371 4250 - 0.0241 0.8867
0.4422 4300 0.1549 - -
0.4525 4400 0.1432 - -
0.4628 4500 0.1592 0.0219 0.8835
0.4731 4600 0.1517 - -
0.4833 4700 0.1463 - -
0.4885 4750 - 0.0228 0.8928
0.4936 4800 0.1525 - -
0.5039 4900 0.1426 - -
0.5142 5000 0.1524 0.0209 0.8903
0.5245 5100 0.1443 - -
0.5348 5200 0.1468 - -
0.5399 5250 - 0.0212 0.8948
0.5450 5300 0.151 - -
0.5553 5400 0.1443 - -
0.5656 5500 0.1438 0.0212 0.8982
0.5759 5600 0.1409 - -
0.5862 5700 0.1346 - -
0.5913 5750 - 0.0207 0.8983
0.5965 5800 0.1315 - -
0.6067 5900 0.1425 - -
0.6170 6000 0.136 0.0188 0.8970
0.6273 6100 0.1426 - -
0.6376 6200 0.1353 - -
0.6427 6250 - 0.0185 0.8969
0.6479 6300 0.1269 - -
0.6582 6400 0.1159 - -
0.6684 6500 0.1311 0.0184 0.9028
0.6787 6600 0.1179 - -
0.6890 6700 0.115 - -
0.6942 6750 - 0.0184 0.9046
0.6993 6800 0.1254 - -
0.7096 6900 0.1233 - -
0.7199 7000 0.122 0.0174 0.9042
0.7302 7100 0.1238 - -
0.7404 7200 0.1257 - -
0.7456 7250 - 0.0175 0.9074
0.7507 7300 0.1222 - -
0.7610 7400 0.1194 - -
0.7713 7500 0.1284 0.0166 0.9080
0.7816 7600 0.1147 - -
0.7919 7700 0.1182 - -
0.7970 7750 - 0.0170 0.9116
0.8021 7800 0.1157 - -
0.8124 7900 0.1299 - -
0.8227 8000 0.114 0.0163 0.9105
0.8330 8100 0.1141 - -
0.8433 8200 0.1195 - -
0.8484 8250 - 0.0160 0.9112
0.8536 8300 0.1073 - -
0.8638 8400 0.1044 - -
0.8741 8500 0.1083 0.0160 0.9153
0.8844 8600 0.1103 - -
0.8947 8700 0.1145 - -
0.8998 8750 - 0.0154 0.9133
0.9050 8800 0.1083 - -
0.9153 8900 0.1205 - -
0.9255 9000 0.1124 0.0153 0.9162
0.9358 9100 0.1067 - -
0.9461 9200 0.116 - -
0.9513 9250 - 0.0152 0.9171
0.9564 9300 0.1126 - -
0.9667 9400 0.1075 - -
0.9770 9500 0.1128 0.0149 0.9169
0.9872 9600 0.1143 - -
0.9975 9700 0.1175 - -

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.1.1
  • Transformers: 4.44.2
  • PyTorch: 2.4.0
  • Accelerate: 0.34.2
  • Datasets: 3.0.0
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
8
Safetensors
Model size
66.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for alpha-brain/stsb-distilbert-base-mnrl

Finetuned
(6)
this model

Evaluation results