Edit model card

SentenceTransformer based on intfloat/multilingual-e5-small

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-small
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 384 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the πŸ€— Hub
model = SentenceTransformer("srikarvar/fine_tuned_model_7")
# Run inference
sentences = [
    'Top literature about World War II',
    'Best books on World War II',
    'What is the price of an iPhone 12?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Binary Classification

Metric Value
cosine_accuracy 0.9
cosine_accuracy_threshold 0.7847
cosine_f1 0.9266
cosine_f1_threshold 0.7847
cosine_precision 0.8938
cosine_recall 0.9619
cosine_ap 0.9549
dot_accuracy 0.9
dot_accuracy_threshold 0.7847
dot_f1 0.9266
dot_f1_threshold 0.7847
dot_precision 0.8938
dot_recall 0.9619
dot_ap 0.9549
manhattan_accuracy 0.8969
manhattan_accuracy_threshold 9.909
manhattan_f1 0.9241
manhattan_f1_threshold 10.1367
manhattan_precision 0.8933
manhattan_recall 0.9571
manhattan_ap 0.955
euclidean_accuracy 0.9
euclidean_accuracy_threshold 0.6562
euclidean_f1 0.9266
euclidean_f1_threshold 0.6562
euclidean_precision 0.8938
euclidean_recall 0.9619
euclidean_ap 0.9549
max_accuracy 0.9
max_accuracy_threshold 9.909
max_f1 0.9266
max_f1_threshold 10.1367
max_precision 0.8938
max_recall 0.9619
max_ap 0.955

Binary Classification

Metric Value
cosine_accuracy 0.9062
cosine_accuracy_threshold 0.8142
cosine_f1 0.9292
cosine_f1_threshold 0.8142
cosine_precision 0.9206
cosine_recall 0.9381
cosine_ap 0.9556
dot_accuracy 0.9062
dot_accuracy_threshold 0.8142
dot_f1 0.9292
dot_f1_threshold 0.8142
dot_precision 0.9206
dot_recall 0.9381
dot_ap 0.9556
manhattan_accuracy 0.9031
manhattan_accuracy_threshold 9.5768
manhattan_f1 0.9271
manhattan_f1_threshold 9.5768
manhattan_precision 0.9163
manhattan_recall 0.9381
manhattan_ap 0.9558
euclidean_accuracy 0.9062
euclidean_accuracy_threshold 0.6095
euclidean_f1 0.9292
euclidean_f1_threshold 0.6095
euclidean_precision 0.9206
euclidean_recall 0.9381
euclidean_ap 0.9556
max_accuracy 0.9062
max_accuracy_threshold 9.5768
max_f1 0.9292
max_f1_threshold 9.5768
max_precision 0.9206
max_recall 0.9381
max_ap 0.9558

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,871 training samples
  • Columns: sentence2, sentence1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence2 sentence1 label
    type string string int
    details
    • min: 5 tokens
    • mean: 20.57 tokens
    • max: 177 tokens
    • min: 6 tokens
    • mean: 20.74 tokens
    • max: 176 tokens
    • 0: ~34.00%
    • 1: ~66.00%
  • Samples:
    sentence2 sentence1 label
    How do I do to get fuller face? How can one get a fuller face? 1
    The DatasetInfo holds the data of a dataset, which may include its description, characteristics, and size. A dataset's information is stored inside DatasetInfo and can include information such as the dataset description, features, and dataset size. 1
    How do I write a resume? How do I create a resume? 1
  • Loss: OnlineContrastiveLoss

Evaluation Dataset

Unnamed Dataset

  • Size: 320 evaluation samples
  • Columns: sentence2, sentence1, and label
  • Approximate statistics based on the first 320 samples:
    sentence2 sentence1 label
    type string string int
    details
    • min: 4 tokens
    • mean: 19.57 tokens
    • max: 135 tokens
    • min: 6 tokens
    • mean: 19.55 tokens
    • max: 136 tokens
    • 0: ~34.38%
    • 1: ~65.62%
  • Samples:
    sentence2 sentence1 label
    Steps to erase internet history How do I delete my browsing history? 1
    How important is it to be the first person to wish someone a happy birthday? What is the right etiquette for wishing a Jehovah Witness happy birthday? 0
    Who directed 'Gone with the Wind'? Who directed 'Citizen Kane'? 0
  • Loss: OnlineContrastiveLoss

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 2
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss pair-class-dev_max_ap pair-class-test_max_ap
0 0 - - 0.8735 -
0.2222 10 1.3298 - - -
0.4444 20 0.8218 - - -
0.6667 30 0.642 - - -
0.8889 40 0.571 - - -
1.0 45 - 0.5321 0.9499 -
1.1111 50 0.4828 - - -
1.3333 60 0.3003 - - -
1.5556 70 0.3331 - - -
1.7778 80 0.203 - - -
2.0 90 0.3539 0.5118 0.9558 -
2.2222 100 0.1357 - - -
2.4444 110 0.1562 - - -
2.6667 120 0.0703 - - -
2.8889 130 0.0806 - - -
3.0 135 - 0.5266 0.9548 -
3.1111 140 0.1721 - - -
3.3333 150 0.1063 - - -
3.5556 160 0.0909 - - -
3.7778 170 0.0358 - - -
4.0 180 0.1021 0.5256 0.9550 0.9558
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.1.0
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.34.2
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
2
Safetensors
Model size
118M params
Tensor type
F32
Β·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for srikarvar/fine_tuned_model_7

Finetuned
(56)
this model

Evaluation results