Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper
•
1908.10084
•
Published
•
9
This is a sentence-transformers model finetuned from Alibaba-NLP/gte-large-en-v1.5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("model_3")
# Run inference
sentences = [
"What was Nathan's response to the initial proposal from Global Air U?",
"I don't see on the proposal.\nI don't see anything class or the class related.\nUm.\nOh, so for the course.\nNo, no.",
'And hopefully that should update now in your account in a second.\nYeah.\nIf you give that a go now, you should see all the way to August 2025.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.3279 |
| cosine_accuracy@3 | 0.4898 |
| cosine_accuracy@5 | 0.5663 |
| cosine_accuracy@10 | 0.6613 |
| cosine_accuracy@30 | 0.767 |
| cosine_accuracy@50 | 0.8155 |
| cosine_accuracy@100 | 0.8598 |
| cosine_precision@1 | 0.3279 |
| cosine_precision@3 | 0.1902 |
| cosine_precision@5 | 0.1383 |
| cosine_precision@10 | 0.0872 |
| cosine_precision@30 | 0.0384 |
| cosine_precision@50 | 0.0257 |
| cosine_precision@100 | 0.0143 |
| cosine_recall@1 | 0.1988 |
| cosine_recall@3 | 0.3261 |
| cosine_recall@5 | 0.391 |
| cosine_recall@10 | 0.4756 |
| cosine_recall@30 | 0.6031 |
| cosine_recall@50 | 0.6602 |
| cosine_recall@100 | 0.7195 |
| cosine_ndcg@10 | 0.3785 |
| cosine_mrr@10 | 0.4295 |
| cosine_map@100 | 0.3193 |
| dot_accuracy@1 | 0.329 |
| dot_accuracy@3 | 0.4887 |
| dot_accuracy@5 | 0.5717 |
| dot_accuracy@10 | 0.6634 |
| dot_accuracy@30 | 0.767 |
| dot_accuracy@50 | 0.8134 |
| dot_accuracy@100 | 0.8619 |
| dot_precision@1 | 0.329 |
| dot_precision@3 | 0.1899 |
| dot_precision@5 | 0.1387 |
| dot_precision@10 | 0.0874 |
| dot_precision@30 | 0.0385 |
| dot_precision@50 | 0.0257 |
| dot_precision@100 | 0.0143 |
| dot_recall@1 | 0.1994 |
| dot_recall@3 | 0.3259 |
| dot_recall@5 | 0.3937 |
| dot_recall@10 | 0.4771 |
| dot_recall@30 | 0.6044 |
| dot_recall@50 | 0.6591 |
| dot_recall@100 | 0.722 |
| dot_ndcg@10 | 0.3791 |
| dot_mrr@10 | 0.4305 |
| dot_map@100 | 0.3195 |
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
What progress has been made with setting up Snowflake share? |
He finally got around to giving me the information necessary to set up Snowflake share. |
Who is Peter Tsanghen and what is the planned interaction with him? |
He finally got around to giving me the information necessary to set up Snowflake share. |
Who is Peter Tsanghen and what is the planned interaction with him? |
Uh, and so now we just have to meet with Peter. |
main.MultipleNegativesRankingLoss_with_loggingper_device_train_batch_size: 4per_device_eval_batch_size: 4num_train_epochs: 2max_steps: 1751disable_tqdm: Truemulti_dataset_batch_sampler: round_robinoverwrite_output_dir: Falsedo_predict: Falseprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 4per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 2max_steps: 1751lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Trueremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falsefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robin| Epoch | Step | cosine_map@100 |
|---|---|---|
| 0.0114 | 20 | 0.2538 |
| 0.0228 | 40 | 0.2601 |
| 0.0342 | 60 | 0.2724 |
| 0.0457 | 80 | 0.2911 |
| 0.0571 | 100 | 0.2976 |
| 0.0685 | 120 | 0.3075 |
| 0.0799 | 140 | 0.3071 |
| 0.0913 | 160 | 0.3111 |
| 0.1027 | 180 | 0.3193 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
Alibaba-NLP/gte-large-en-v1.5