--- base_model: google-bert/bert-base-uncased library_name: sentence-transformers metrics: - cosine_accuracy - cosine_accuracy_threshold - cosine_f1 - cosine_f1_threshold - cosine_precision - cosine_recall - cosine_ap - dot_accuracy - dot_accuracy_threshold - dot_f1 - dot_f1_threshold - dot_precision - dot_recall - dot_ap - manhattan_accuracy - manhattan_accuracy_threshold - manhattan_f1 - manhattan_f1_threshold - manhattan_precision - manhattan_recall - manhattan_ap - euclidean_accuracy - euclidean_accuracy_threshold - euclidean_f1 - euclidean_f1_threshold - euclidean_precision - euclidean_recall - euclidean_ap - max_accuracy - max_accuracy_threshold - max_f1 - max_f1_threshold - max_precision - max_recall - max_ap pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:103663 - loss:MultipleNegativesRankingLoss widget: - source_sentence: How much native Icelandic and advanced Icelandic learners can read and understand Old Norse? sentences: - What are the best answers for "Why should I hire you?"in a cool way? - Are girls shy in expressing their feelings? - If I learn Icelandic can I understand old norse texts? - source_sentence: Where can I get quality assistance for budget conveyancing across the Sydney? sentences: - What are the possible options for India to deal with Uri terror attack? - What is the intended purpose of philosophy? - Where can I get quality assistance in Sydney for any property transaction? - source_sentence: What are some of the best IAS coaching institutions in Mumbai? sentences: - What are best IAS coaching institutes in Mumbai? - Do vampires really exist? - What do most women feel during sex? - source_sentence: Is petroleum engineering still a good major? sentences: - What are some of the best sex stories? - Can I clear CAT in 4.5 months? - What is the future of petroleum engineering graduating in 2020? - source_sentence: How can the drive from Edmonton to Auckland be described, and how do these cities' attractions compare to those in Vancouver? sentences: - How can the drive from Edmonton to Auckland be described, and how does the history of these cities compare and contrast to the history of Vancouver? - What are the best hashtags to use as a photographer on instagram? - Which optional subjects can I choose for the IAS exam? model-index: - name: SentenceTransformer based on google-bert/bert-base-uncased results: - task: type: binary-classification name: Binary Classification dataset: name: Unknown type: unknown metrics: - type: cosine_accuracy value: 0.7643828947012523 name: Cosine Accuracy - type: cosine_accuracy_threshold value: 0.8147265911102295 name: Cosine Accuracy Threshold - type: cosine_f1 value: 0.6959193470955354 name: Cosine F1 - type: cosine_f1_threshold value: 0.7402496337890625 name: Cosine F1 Threshold - type: cosine_precision value: 0.5945532101060921 name: Cosine Precision - type: cosine_recall value: 0.838953622964735 name: Cosine Recall - type: cosine_ap value: 0.7112611713824615 name: Cosine Ap - type: dot_accuracy value: 0.7399583457304374 name: Dot Accuracy - type: dot_accuracy_threshold value: 153.5009765625 name: Dot Accuracy Threshold - type: dot_f1 value: 0.6710917251406536 name: Dot F1 - type: dot_f1_threshold value: 133.23265075683594 name: Dot F1 Threshold - type: dot_precision value: 0.5683387761657477 name: Dot Precision - type: dot_recall value: 0.8191990122694652 name: Dot Recall - type: dot_ap value: 0.6542447011722929 name: Dot Ap - type: manhattan_accuracy value: 0.7665197046333613 name: Manhattan Accuracy - type: manhattan_accuracy_threshold value: 176.4288787841797 name: Manhattan Accuracy Threshold - type: manhattan_f1 value: 0.6972882533068157 name: Manhattan F1 - type: manhattan_f1_threshold value: 218.96762084960938 name: Manhattan F1 Threshold - type: manhattan_precision value: 0.590020301314243 name: Manhattan Precision - type: manhattan_recall value: 0.8522262520256193 name: Manhattan Recall - type: manhattan_ap value: 0.7109056366977289 name: Manhattan Ap - type: euclidean_accuracy value: 0.7665197046333613 name: Euclidean Accuracy - type: euclidean_accuracy_threshold value: 8.092199325561523 name: Euclidean Accuracy Threshold - type: euclidean_f1 value: 0.6970045347129081 name: Euclidean F1 - type: euclidean_f1_threshold value: 9.794208526611328 name: Euclidean F1 Threshold - type: euclidean_precision value: 0.5945518932171071 name: Euclidean Precision - type: euclidean_recall value: 0.8421174473338993 name: Euclidean Recall - type: euclidean_ap value: 0.7109417385930392 name: Euclidean Ap - type: max_accuracy value: 0.7665197046333613 name: Max Accuracy - type: max_accuracy_threshold value: 176.4288787841797 name: Max Accuracy Threshold - type: max_f1 value: 0.6972882533068157 name: Max F1 - type: max_f1_threshold value: 218.96762084960938 name: Max F1 Threshold - type: max_precision value: 0.5945532101060921 name: Max Precision - type: max_recall value: 0.8522262520256193 name: Max Recall - type: max_ap value: 0.7112611713824615 name: Max Ap --- # SentenceTransformer based on google-bert/bert-base-uncased This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) - **Maximum Sequence Length:** 128 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("gavinqiangli/my-awesome-bi-encoder") # Run inference sentences = [ "How can the drive from Edmonton to Auckland be described, and how do these cities' attractions compare to those in Vancouver?", 'How can the drive from Edmonton to Auckland be described, and how does the history of these cities compare and contrast to the history of Vancouver?', 'Which optional subjects can I choose for the IAS exam?', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Binary Classification * Evaluated with [BinaryClassificationEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator) | Metric | Value | |:-----------------------------|:-----------| | cosine_accuracy | 0.7644 | | cosine_accuracy_threshold | 0.8147 | | cosine_f1 | 0.6959 | | cosine_f1_threshold | 0.7402 | | cosine_precision | 0.5946 | | cosine_recall | 0.839 | | cosine_ap | 0.7113 | | dot_accuracy | 0.74 | | dot_accuracy_threshold | 153.501 | | dot_f1 | 0.6711 | | dot_f1_threshold | 133.2327 | | dot_precision | 0.5683 | | dot_recall | 0.8192 | | dot_ap | 0.6542 | | manhattan_accuracy | 0.7665 | | manhattan_accuracy_threshold | 176.4289 | | manhattan_f1 | 0.6973 | | manhattan_f1_threshold | 218.9676 | | manhattan_precision | 0.59 | | manhattan_recall | 0.8522 | | manhattan_ap | 0.7109 | | euclidean_accuracy | 0.7665 | | euclidean_accuracy_threshold | 8.0922 | | euclidean_f1 | 0.697 | | euclidean_f1_threshold | 9.7942 | | euclidean_precision | 0.5946 | | euclidean_recall | 0.8421 | | euclidean_ap | 0.7109 | | max_accuracy | 0.7665 | | max_accuracy_threshold | 176.4289 | | max_f1 | 0.6973 | | max_f1_threshold | 218.9676 | | max_precision | 0.5946 | | max_recall | 0.8522 | | **max_ap** | **0.7113** | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 103,663 training samples * Columns: sentence_0, sentence_1, and label * Approximate statistics based on the first 1000 samples: | | sentence_0 | sentence_1 | label | |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------| | type | string | string | int | | details | | | | * Samples: | sentence_0 | sentence_1 | label | |:-------------------------------------------------------------------------------------|:---------------------------------------------------------|:---------------| | Are Jewish people the most intelligent in the universe? | Why are Jewish people so intelligent? | 1 | | How do I become a good lawyer? What are the qualities of a good lawyer? | How can someone become a successful lawyer? | 1 | | Why is China going to the Moon? | What does China want with the moon? | 1 | * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `num_train_epochs`: 1 - `multi_dataset_batch_sampler`: round_robin #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1 - `num_train_epochs`: 1 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.0 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `eval_use_gather_object`: False - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: round_robin
### Training Logs | Epoch | Step | Training Loss | max_ap | |:------:|:----:|:-------------:|:------:| | 0.0772 | 500 | 0.0796 | - | | 0.1543 | 1000 | 0.0205 | 0.6878 | | 0.2315 | 1500 | 0.0197 | - | | 0.3087 | 2000 | 0.0201 | 0.6864 | | 0.3859 | 2500 | 0.0185 | - | | 0.4630 | 3000 | 0.0161 | 0.6933 | | 0.5402 | 3500 | 0.0163 | - | | 0.6174 | 4000 | 0.0172 | 0.7089 | | 0.6946 | 4500 | 0.0172 | - | | 0.7717 | 5000 | 0.0143 | 0.7072 | | 0.8489 | 5500 | 0.0129 | - | | 0.9261 | 6000 | 0.0124 | 0.7112 | | 1.0 | 6479 | - | 0.7113 | ### Framework Versions - Python: 3.10.12 - Sentence Transformers: 3.2.1 - Transformers: 4.44.2 - PyTorch: 2.5.0+cu121 - Accelerate: 0.34.2 - Datasets: 3.1.0 - Tokenizers: 0.19.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```