SentenceTransformer based on google-bert/bert-base-uncased
This is a sentence-transformers model finetuned from google-bert/bert-base-uncased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: google-bert/bert-base-uncased
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the ๐ค Hub
model = SentenceTransformer("ryanhoangt/bert-base-uncased-mnli-cosine")
# Run inference
sentences = [
'The river plays a central role in all visits to Paris.',
'The river is central to all vacations to Paris.',
'Trauma is the leading cause of alcohol abuse.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Semantic Similarity
- Evaluated with
EmbeddingSimilarityEvaluator
| Metric | Value |
|---|---|
| pearson_cosine | 0.7302 |
| spearman_cosine | 0.7323 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 50,000 training samples
- Columns:
sentence1,sentence2, andlabel - Approximate statistics based on the first 1000 samples:
sentence1 sentence2 label type string string float details - min: 4 tokens
- mean: 26.95 tokens
- max: 189 tokens
- min: 5 tokens
- mean: 14.11 tokens
- max: 49 tokens
- min: 0.0
- mean: 0.34
- max: 1.0
- Samples:
sentence1 sentence2 label Conceptually cream skimming has two basic dimensions - product and geography.Product and geography are what make cream skimming work.0.0you know during the season and i guess at at your level uh you lose them to the next level if if they decide to recall the the parent team the Braves decide to call to recall a guy from triple A then a double A guy goes up to replace him and a single A guy goes up to replace himYou lose the things to the following level if the people recall.1.0One of our number will carry out your instructions minutely.A member of my team will execute your orders with immense precision.1.0 - Loss:
CosineSimilarityLosswith these parameters:{ "loss_fct": "torch.nn.modules.loss.MSELoss" }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 32per_device_eval_batch_size: 32num_train_epochs: 1warmup_steps: 100fp16: True
All Hyperparameters
Click to expand
overwrite_output_dir: Falsedo_predict: Falseeval_strategy: noprediction_loss_only: Trueper_device_train_batch_size: 32per_device_eval_batch_size: 32per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 100log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters:auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional
Training Logs
| Epoch | Step | Training Loss | spearman_cosine |
|---|---|---|---|
| 0.0320 | 50 | 0.2752 | - |
| 0.0640 | 100 | 0.1898 | - |
| 0.0960 | 150 | 0.1733 | - |
| 0.1280 | 200 | 0.1679 | - |
| 0.1599 | 250 | 0.1743 | - |
| 0.1919 | 300 | 0.1703 | - |
| 0.2239 | 350 | 0.1599 | - |
| 0.2559 | 400 | 0.1614 | - |
| 0.2879 | 450 | 0.149 | - |
| 0.3199 | 500 | 0.1555 | - |
| 0.3519 | 550 | 0.1631 | - |
| 0.3839 | 600 | 0.1537 | - |
| 0.4159 | 650 | 0.1497 | - |
| 0.4479 | 700 | 0.1512 | - |
| 0.4798 | 750 | 0.157 | - |
| 0.5118 | 800 | 0.1544 | - |
| 0.5438 | 850 | 0.1502 | - |
| 0.5758 | 900 | 0.1459 | - |
| 0.6078 | 950 | 0.1476 | - |
| 0.6398 | 1000 | 0.1439 | - |
| 0.6718 | 1050 | 0.1508 | - |
| 0.7038 | 1100 | 0.1444 | - |
| 0.7358 | 1150 | 0.1457 | - |
| 0.7678 | 1200 | 0.1486 | - |
| 0.7997 | 1250 | 0.1485 | - |
| 0.8317 | 1300 | 0.1419 | - |
| 0.8637 | 1350 | 0.1406 | - |
| 0.8957 | 1400 | 0.1407 | - |
| 0.9277 | 1450 | 0.1434 | - |
| 0.9597 | 1500 | 0.1365 | - |
| 0.9917 | 1550 | 0.1465 | - |
| -1 | -1 | - | 0.7323 |
Framework Versions
- Python: 3.11.12
- Sentence Transformers: 4.1.0
- Transformers: 4.52.2
- PyTorch: 2.6.0+cu124
- Accelerate: 1.7.0
- Datasets: 3.2.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
- Downloads last month
- 1
Model tree for ryanhoangt/bert-base-uncased-mnli-cosine
Base model
google-bert/bert-base-uncasedEvaluation results
- Pearson Cosine on Unknownself-reported0.730
- Spearman Cosine on Unknownself-reported0.732