Edit model card

SentenceTransformer based on intfloat/multilingual-e5-large

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-large
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("bourdoiscatie/multilingual-e5-large-approche5")
# Run inference
sentences = [
    "Tenet est sous surveillance depuis novembre, lorsque l'ancien directeur général Jeffrey Barbakow a déclaré que la société a utilisé des prix agressifs pour déclencher des paiements plus élevés pour les patients les plus malades de l'assurance maladie.",
    "En novembre, Jeffrey Brabakow, le directeur général de l'époque, a déclaré que la société utilisait des prix agressifs pour obtenir des paiements plus élevés pour les patients les plus malades de l'assurance maladie.",
    'La femme est en route pour un rendez-vous.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • num_train_epochs: 1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss nli loss sts loss triplet loss
0.0137 500 2.3683 - - -
0.0273 1000 2.2564 - - -
0.0410 1500 2.3976 - - -
0.0547 2000 2.1925 - - -
0.0684 2500 2.1542 - - -
0.0820 3000 2.0945 - - -
0.0957 3500 2.1411 - - -
0.1094 4000 1.9079 - - -
0.1231 4500 1.7574 - - -
0.1367 5000 2.1923 - - -
0.1504 5500 2.0054 - - -
0.1641 6000 1.6717 - - -
0.1778 6500 1.7374 - - -
0.1914 7000 2.0042 - - -
0.2051 7500 1.7486 - - -
0.2188 8000 1.5635 - - -
0.2324 8500 1.8133 - - -
0.2461 9000 1.7885 - - -
0.2598 9500 1.6298 - - -
0.2735 10000 1.3568 - - -
0.2871 10500 1.8475 - - -
0.3008 11000 1.7642 - - -
0.3145 11500 1.4048 - - -
0.3282 12000 1.3782 - - -
0.3418 12500 1.8164 - - -
0.3555 13000 1.5559 - - -
0.3692 13500 1.2515 - - -
0.3828 14000 1.4736 - - -
0.3965 14500 1.5527 - - -
0.4102 15000 1.384 - - -
0.4239 15500 1.167 - - -
0.4375 16000 1.6116 - - -
0.4512 16500 1.5668 - - -
0.4649 17000 1.1458 - - -
0.4786 17500 1.1103 - - -
0.4922 18000 1.6152 - - -
0.5059 18500 1.347 - - -
0.5196 19000 1.1 - - -
0.5333 19500 1.2662 - - -
0.5469 20000 1.456 - - -
0.5606 20500 1.1928 - - -
0.5743 21000 0.9972 - - -
0.5879 21500 1.4499 - - -
0.6016 22000 1.3264 - - -
0.6153 22500 1.003 - - -
0.6290 23000 1.0512 - - -
0.6426 23500 1.3041 - - -
0.6563 24000 1.1227 - - -
0.6700 24500 0.9579 - - -
0.6837 25000 1.1196 - - -
0.6973 25500 1.1362 - - -
0.7110 26000 1.0376 - - -
0.7247 26500 0.8037 - - -
0.7384 27000 1.2622 - - -
0.7520 27500 1.1696 - - -
0.7657 28000 0.8923 - - -
0.7794 28500 0.8389 - - -
0.7930 29000 1.2655 - - -
0.8067 29500 0.965 - - -
0.8204 30000 0.8043 - - -
0.8341 30500 1.0491 - - -
0.8477 31000 1.1186 - - -
0.8614 31500 0.8794 - - -
0.8751 32000 0.7776 - - -
0.8888 32500 1.1299 - - -
0.9024 33000 0.9544 - - -
0.9161 33500 0.7195 - - -
0.9298 34000 0.8298 - - -
0.9434 34500 1.0767 - - -
0.9571 35000 0.8287 - - -
0.9708 35500 0.7331 - - -
0.9845 36000 0.904 - - -
0.9981 36500 0.9645 - - -
1.0 36568 - 0.0193 5.4479 0.5933

Framework Versions

  • Python: 3.12.6
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.2
  • PyTorch: 2.4.0+cu121
  • Accelerate: 0.29.3
  • Datasets: 3.0.2
  • Tokenizers: 0.20.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
11
Safetensors
Model size
560M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for bourdoiscatie/multilingual-e5-large-approche5

Finetuned
(70)
this model

Collection including bourdoiscatie/multilingual-e5-large-approche5