manu's picture
Upload folder using huggingface_hub
b1980db verified
metadata
language:
  - en
library_name: sentence-transformers
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dataset_size:100K<n<1M
  - loss:CachedMultipleNegativesRankingLoss
base_model: FacebookAI/xlm-roberta-large
metrics:
  - cosine_accuracy
  - dot_accuracy
  - manhattan_accuracy
  - euclidean_accuracy
  - max_accuracy
widget:
  - source_sentence: The boy scowls
    sentences:
      - The boy is outside.
      - The man is in a city.
      - A woman at home.
  - source_sentence: A woman sings.
    sentences:
      - The woman is singing.
      - a man is wearing blue
      - The boys are eating.
  - source_sentence: A bird flying.
    sentences:
      - A butterfly flys freely.
      - She checks her phone.
      - A man is sleeping.
  - source_sentence: an eagle flies
    sentences:
      - A butterfly flys freely.
      - The men are together.
      - A man is sleeping.
  - source_sentence: There's a dock
    sentences:
      - There are people outdoors
      - Boy playing baseball.
      - A man is sleeping.
pipeline_tag: sentence-similarity
model-index:
  - name: SentenceTransformer based on FacebookAI/xlm-roberta-large
    results:
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: all nli dev
          type: all-nli-dev
        metrics:
          - type: cosine_accuracy
            value: 0.941
            name: Cosine Accuracy
          - type: dot_accuracy
            value: 0.062
            name: Dot Accuracy
          - type: manhattan_accuracy
            value: 0.937
            name: Manhattan Accuracy
          - type: euclidean_accuracy
            value: 0.938
            name: Euclidean Accuracy
          - type: max_accuracy
            value: 0.941
            name: Max Accuracy
      - task:
          type: triplet
          name: Triplet
        dataset:
          name: all nli test
          type: all-nli-test
        metrics:
          - type: cosine_accuracy
            value: 0.943
            name: Cosine Accuracy
          - type: dot_accuracy
            value: 0.057
            name: Dot Accuracy
          - type: manhattan_accuracy
            value: 0.947
            name: Manhattan Accuracy
          - type: euclidean_accuracy
            value: 0.947
            name: Euclidean Accuracy
          - type: max_accuracy
            value: 0.947
            name: Max Accuracy

SentenceTransformer based on FacebookAI/xlm-roberta-large

This is a sentence-transformers model finetuned from FacebookAI/xlm-roberta-large on the sentence-transformers/all-nli dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    "There's a dock",
    'There are people outdoors',
    'Boy playing baseball.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.941
dot_accuracy 0.062
manhattan_accuracy 0.937
euclidean_accuracy 0.938
max_accuracy 0.941

Triplet

Metric Value
cosine_accuracy 0.943
dot_accuracy 0.057
manhattan_accuracy 0.947
euclidean_accuracy 0.947
max_accuracy 0.947

Training Details

Training Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at d482672
  • Size: 100,000 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 10.9 tokens
    • max: 52 tokens
    • min: 6 tokens
    • mean: 13.62 tokens
    • max: 42 tokens
    • min: 5 tokens
    • mean: 14.76 tokens
    • max: 55 tokens
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Children smiling and waving at camera There are children present The kids are frowning
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. The boy skates down the sidewalk.
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

sentence-transformers/all-nli

  • Dataset: sentence-transformers/all-nli at d482672
  • Size: 1,000 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 20.31 tokens
    • max: 83 tokens
    • min: 5 tokens
    • mean: 10.71 tokens
    • max: 35 tokens
    • min: 5 tokens
    • mean: 11.39 tokens
    • max: 32 tokens
  • Samples:
    anchor positive negative
    Two women are embracing while holding to go packages. Two woman are holding packages. The men are fighting outside a deli.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands. Two kids in jackets walk to school.
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles A man selling donuts to a customer. A woman drinks her coffee in a small cafe.
  • Loss: CachedMultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss all-nli-dev_max_accuracy all-nli-test_max_accuracy
0 0 - - 0.613 -
0.016 100 3.4639 3.4199 0.621 -
0.032 200 3.4496 3.1967 0.841 -
0.048 300 2.2928 1.0476 0.864 -
0.064 400 1.2217 0.9993 0.871 -
0.08 500 1.1075 1.2674 0.85 -
0.096 600 1.2113 1.2565 0.866 -
0.112 700 1.0326 1.3313 0.855 -
0.128 800 1.2326 1.3698 0.851 -
0.144 900 1.2897 1.2690 0.855 -
0.16 1000 1.275 1.1231 0.863 -
0.176 1100 1.0823 1.2453 0.853 -
0.192 1200 1.1933 1.1119 0.868 -
0.208 1300 1.0102 0.9491 0.86 -
0.224 1400 0.8738 1.0682 0.87 -
0.24 1500 0.9482 0.8546 0.89 -
0.256 1600 0.6985 0.9136 0.88 -
0.272 1700 0.9908 0.9539 0.873 -
0.288 1800 1.0166 0.9277 0.878 -
0.304 1900 0.9441 0.9000 0.886 -
0.32 2000 0.8911 0.8364 0.891 -
0.336 2100 0.6746 0.8585 0.883 -
0.352 2200 0.7379 0.8332 0.888 -
0.368 2300 0.896 0.7617 0.89 -
0.384 2400 0.7901 0.7351 0.887 -
0.4 2500 0.811 0.7855 0.89 -
0.416 2600 0.6723 0.6756 0.899 -
0.432 2700 0.8839 0.7839 0.894 -
0.448 2800 0.9027 0.7319 0.903 -
0.464 2900 0.9276 0.7038 0.893 -
0.48 3000 0.7692 0.6653 0.903 -
0.496 3100 0.8044 0.6466 0.901 -
0.512 3200 0.6433 0.6145 0.906 -
0.528 3300 0.6642 0.5774 0.912 -
0.544 3400 0.5904 0.6054 0.907 -
0.56 3500 0.6378 0.5841 0.91 -
0.576 3600 0.5602 0.5444 0.921 -
0.592 3700 0.6436 0.5563 0.917 -
0.608 3800 0.588 0.5108 0.927 -
0.624 3900 0.5834 0.5059 0.925 -
0.64 4000 0.842 0.5217 0.929 -
0.656 4100 1.0995 0.5060 0.933 -
0.672 4200 0.9605 0.4842 0.928 -
0.688 4300 0.7811 0.4756 0.93 -
0.704 4400 0.7288 0.4650 0.938 -
0.72 4500 0.6636 0.4576 0.94 -
0.736 4600 0.7445 0.4552 0.934 -
0.752 4700 0.7687 0.4511 0.934 -
0.768 4800 0.7101 0.4446 0.936 -
0.784 4900 0.6586 0.4378 0.937 -
0.8 5000 0.789 0.4368 0.938 -
0.816 5100 0.6227 0.4344 0.941 -
0.832 5200 0.6994 0.4349 0.939 -
0.848 5300 0.687 0.4327 0.943 -
0.864 5400 0.76 0.4319 0.943 -
0.88 5500 0.6644 0.4323 0.941 -
0.896 5600 0.6535 0.4306 0.941 -
0.912 5700 0.7622 0.4289 0.941 -
0.928 5800 0.7053 0.4288 0.94 -
0.944 5900 0.8093 0.4289 0.94 -
0.96 6000 0.8658 0.4284 0.941 -
0.976 6100 0.7624 0.4283 0.941 -
0.992 6200 0.0003 0.4286 0.941 -
1.0 6250 - - - 0.947

Framework Versions

  • Python: 3.9.10
  • Sentence Transformers: 3.0.0
  • Transformers: 4.41.2
  • PyTorch: 2.3.0+cu121
  • Accelerate: 0.26.1
  • Datasets: 2.16.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup}, 
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}