SentenceTransformer based on sentence-transformers/multi-qa-MiniLM-L6-dot-v1
This is a sentence-transformers model finetuned from sentence-transformers/multi-qa-MiniLM-L6-dot-v1. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/multi-qa-MiniLM-L6-dot-v1
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 384 tokens
- Similarity Function: Dot Product
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Trelis/multi-qa-MiniLM-L6-dot-v1-ft-pairs-4-cst-epoch-s1")
# Run inference
sentences = [
'What is the consequence if a defending team is penalized three times in their seven-meter zone during a single possession?',
'18. 5 the mark must be indicated by the referee before a penalty tap is taken. 18. 6 the penalty tap must be performed without delay after the referee indicates the mark. ruling = a penalty to the non - offending team at the point of infringement. 18. 7 a player may perform a rollball instead of a penalty tap and the player who receives the ball does not become the half. 18. 8 if the defending team is penalised three ( 3 ) times upon entering their seven metre zone during a single possession, the last offending player will be given an exclusion until the end of that possession. 18. 9 a penalty try is awarded if any action by a player, team official or spectator, deemed by the referee to be contrary to the rules or spirit of the game clearly prevents the attacking team from scoring a try. fit playing rules - 5th edition copyright © touch football australia 2020 15 19 advantage 19. 1 where a defending team player is offside at a tap or rollball and attempts to interfere with play, the referee will allow advantage or award a penalty, whichever is of greater advantage to the attacking team.',
'5th edition rules touch football tion rules touch football touch football australia ( tfa ) undertook an extensive internal review of their domestic playing rules throughout 2018 and 2019. the review was led by an vastly experienced group of current and past players, coaches, referees and administrators of the sport from community competitions to the elite international game. this group consulted broadly within the australian community to develop a set of playing rules that could be applied across all levels of the sport. the result was the tfa 8th edition playing rules. at the federation of international touch paris convention held in october 2019 touch football australia presented the tfa 8th edition playing rules and subsequently offered fit and all national touch associations ( ntas ) royalty free rights to use the newly developed rules. consequently, the fit board resolved to adopt the tfa 8th edition playing rules as the 5th edition fit playing rules to be used across all levels of the game internationally. fit and its members acknowledge and thank touch football australia for the rights to use these rules. whilst consistency in the application of the rules of the game is important, fit encourages its members to offer features in local competition rules to ensure that all participants enjoy a high quality experience.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 32per_device_eval_batch_size
: 32learning_rate
: 2e-05num_train_epochs
: 4lr_scheduler_type
: constantwarmup_ratio
: 0.3
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 32per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 4max_steps
: -1lr_scheduler_type
: constantlr_scheduler_kwargs
: {}warmup_ratio
: 0.3warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | loss |
---|---|---|---|
0.3333 | 2 | 1.7279 | - |
0.5 | 3 | - | 1.3621 |
0.6667 | 4 | 1.4819 | - |
1.0 | 6 | 1.5272 | 1.2755 |
1.3333 | 8 | 1.2528 | - |
1.5 | 9 | - | 1.2600 |
1.6667 | 10 | 1.421 | - |
2.0 | 12 | 1.1836 | 1.2422 |
2.3333 | 14 | 1.2527 | - |
2.5 | 15 | - | 1.2317 |
2.6667 | 16 | 1.485 | - |
3.0 | 18 | 0.8239 | 1.1883 |
3.3333 | 20 | 1.1028 | - |
3.5 | 21 | - | 1.1533 |
3.6667 | 22 | 0.9746 | - |
4.0 | 24 | 0.816 | 1.1237 |
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.42.3
- PyTorch: 2.1.1+cu121
- Accelerate: 0.31.0
- Datasets: 2.17.1
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.