SentenceTransformer based on answerdotai/ModernBERT-base

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base on the all-nli-pair, all-nli-pair-class, all-nli-pair-score, all-nli-triplet, stsb, quora and natural-questions datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("nickprock/modernbert-base-all-nli-stsb-quora-nq")
# Run inference
sentences = [
    'There is a very full description of the various types of hormone rooting compound here.',
    'It is meant to stimulate root growth - in particular to stimulate the creation of roots.',
    "The least that can be said is that we must be born with the ability and 'knowledge' to learn.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Datasets

all-nli-pair

all-nli-pair

  • Dataset: all-nli-pair at d482672
  • Size: 10,000 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 5 tokens
    • mean: 17.29 tokens
    • max: 64 tokens
    • min: 5 tokens
    • mean: 9.7 tokens
    • max: 31 tokens
  • Samples:
    anchor positive
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse.
    Children smiling and waving at camera There are children present
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    
all-nli-pair-class

all-nli-pair-class

  • Dataset: all-nli-pair-class at d482672
  • Size: 10,000 training samples
  • Columns: premise, hypothesis, and label
  • Approximate statistics based on the first 1000 samples:
    premise hypothesis label
    type string string int
    details
    • min: 6 tokens
    • mean: 17.6 tokens
    • max: 51 tokens
    • min: 5 tokens
    • mean: 10.8 tokens
    • max: 33 tokens
    • 0: ~33.40%
    • 1: ~33.30%
    • 2: ~33.30%
  • Samples:
    premise hypothesis label
    A person on a horse jumps over a broken down airplane. A person is training his horse for a competition. 1
    A person on a horse jumps over a broken down airplane. A person is at a diner, ordering an omelette. 2
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. 0
  • Loss: SoftmaxLoss
all-nli-pair-score

all-nli-pair-score

  • Dataset: all-nli-pair-score at d482672
  • Size: 10,000 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 17.6 tokens
    • max: 51 tokens
    • min: 5 tokens
    • mean: 10.8 tokens
    • max: 33 tokens
    • min: 0.0
    • mean: 0.5
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A person on a horse jumps over a broken down airplane. A person is training his horse for a competition. 0.5
    A person on a horse jumps over a broken down airplane. A person is at a diner, ordering an omelette. 0.0
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. 1.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    
all-nli-triplet

all-nli-triplet

  • Dataset: all-nli-triplet at d482672
  • Size: 10,000 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 7 tokens
    • mean: 10.46 tokens
    • max: 46 tokens
    • min: 6 tokens
    • mean: 12.91 tokens
    • max: 40 tokens
    • min: 5 tokens
    • mean: 13.49 tokens
    • max: 51 tokens
  • Samples:
    anchor positive negative
    A person on a horse jumps over a broken down airplane. A person is outdoors, on a horse. A person is at a diner, ordering an omelette.
    Children smiling and waving at camera There are children present The kids are frowning
    A boy is jumping on skateboard in the middle of a red bridge. The boy does a skateboarding trick. The boy skates down the sidewalk.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    
stsb

stsb

  • Dataset: stsb at ab7a5ac
  • Size: 5,749 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 6 tokens
    • mean: 10.16 tokens
    • max: 28 tokens
    • min: 6 tokens
    • mean: 10.12 tokens
    • max: 25 tokens
    • min: 0.0
    • mean: 0.45
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A plane is taking off. An air plane is taking off. 1.0
    A man is playing a large flute. A man is playing a flute. 0.76
    A man is spreading shreded cheese on a pizza. A man is spreading shredded cheese on an uncooked pizza. 0.76
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    
quora

quora

  • Dataset: quora at 451a485
  • Size: 10,000 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 13.91 tokens
    • max: 45 tokens
    • min: 6 tokens
    • mean: 14.09 tokens
    • max: 44 tokens
  • Samples:
    anchor positive
    Astrology: I am a Capricorn Sun Cap moon and cap rising...what does that say about me? I'm a triple Capricorn (Sun, Moon and ascendant in Capricorn) What does this say about me?
    How can I be a good geologist? What should I do to be a great geologist?
    How do I read and find my YouTube comments? How can I see all my Youtube comments?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    
natural-questions

natural-questions

  • Dataset: natural-questions at f9e894e
  • Size: 10,000 training samples
  • Columns: query and answer
  • Approximate statistics based on the first 1000 samples:
    query answer
    type string string
    details
    • min: 10 tokens
    • mean: 12.47 tokens
    • max: 23 tokens
    • min: 17 tokens
    • mean: 138.32 tokens
    • max: 556 tokens
  • Samples:
    query answer
    when did richmond last play in a preliminary final Richmond Football Club Richmond began 2017 with 5 straight wins, a feat it had not achieved since 1995. A series of close losses hampered the Tigers throughout the middle of the season, including a 5-point loss to the Western Bulldogs, 2-point loss to Fremantle, and a 3-point loss to the Giants. Richmond ended the season strongly with convincing victories over Fremantle and St Kilda in the final two rounds, elevating the club to 3rd on the ladder. Richmond's first final of the season against the Cats at the MCG attracted a record qualifying final crowd of 95,028; the Tigers won by 51 points. Having advanced to the first preliminary finals for the first time since 2001, Richmond defeated Greater Western Sydney by 36 points in front of a crowd of 94,258 to progress to the Grand Final against Adelaide, their first Grand Final appearance since 1982. The attendance was 100,021, the largest crowd to a grand final since 1986. The Crows led at quarter time and led by as many as 13, but the Tig...
    who sang what in the world's come over you Jack Scott (singer) At the beginning of 1960, Scott again changed record labels, this time to Top Rank Records.[1] He then recorded four Billboard Hot 100 hits â€“ "What in the World's Come Over You" (#5), "Burning Bridges" (#3) b/w "Oh Little One" (#34), and "It Only Happened Yesterday" (#38).[1] "What in the World's Come Over You" was Scott's second gold disc winner.[6] Scott continued to record and perform during the 1960s and 1970s.[1] His song "You're Just Gettin' Better" reached the country charts in 1974.[1] In May 1977, Scott recorded a Peel session for BBC Radio 1 disc jockey, John Peel.
    who produces the most wool in the world Wool Global wool production is about 2 million tonnes per year, of which 60% goes into apparel. Wool comprises ca 3% of the global textile market, but its value is higher owing to dying and other modifications of the material.[1] Australia is a leading producer of wool which is mostly from Merino sheep but has been eclipsed by China in terms of total weight.[30] New Zealand (2016) is the third-largest producer of wool, and the largest producer of crossbred wool. Breeds such as Lincoln, Romney, Drysdale, and Elliotdale produce coarser fibers, and wool from these sheep is usually used for making carpets.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Datasets

all-nli-triplet

all-nli-triplet

  • Dataset: all-nli-triplet at d482672
  • Size: 6,584 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 6 tokens
    • mean: 18.25 tokens
    • max: 69 tokens
    • min: 5 tokens
    • mean: 9.88 tokens
    • max: 30 tokens
    • min: 5 tokens
    • mean: 10.48 tokens
    • max: 29 tokens
  • Samples:
    anchor positive negative
    Two women are embracing while holding to go packages. Two woman are holding packages. The men are fighting outside a deli.
    Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink. Two kids in numbered jerseys wash their hands. Two kids in jackets walk to school.
    A man selling donuts to a customer during a world exhibition event held in the city of Angeles A man selling donuts to a customer. A woman drinks her coffee in a small cafe.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    
stsb

stsb

  • Dataset: stsb at ab7a5ac
  • Size: 1,500 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 5 tokens
    • mean: 15.11 tokens
    • max: 44 tokens
    • min: 6 tokens
    • mean: 15.1 tokens
    • max: 50 tokens
    • min: 0.0
    • mean: 0.42
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    A man with a hard hat is dancing. A man wearing a hard hat is dancing. 1.0
    A young child is riding a horse. A child is riding a horse. 0.95
    A man is feeding a mouse to a snake. The man is feeding a mouse to the snake. 1.0
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    
quora

quora

  • Dataset: quora at 451a485
  • Size: 1,000 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 6 tokens
    • mean: 14.01 tokens
    • max: 63 tokens
    • min: 6 tokens
    • mean: 14.04 tokens
    • max: 46 tokens
  • Samples:
    anchor positive
    What is your New Year resolution? What can be my new year resolution for 2017?
    Should I buy the IPhone 6s or Samsung Galaxy s7? Which is better: the iPhone 6S Plus or the Samsung Galaxy S7 Edge?
    What are the differences between transgression and regression? What is the difference between transgression and regression?
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    
natural-questions

natural-questions

  • Dataset: natural-questions at f9e894e
  • Size: 1,000 evaluation samples
  • Columns: query and answer
  • Approximate statistics based on the first 1000 samples:
    query answer
    type string string
    details
    • min: 9 tokens
    • mean: 12.51 tokens
    • max: 26 tokens
    • min: 19 tokens
    • mean: 140.84 tokens
    • max: 585 tokens
  • Samples:
    query answer
    where does the waikato river begin and end Waikato River The Waikato River is the longest river in New Zealand, running for 425 kilometres (264 mi) through the North Island. It rises in the eastern slopes of Mount Ruapehu, joining the Tongariro River system and flowing through Lake Taupo, New Zealand's largest lake. It then drains Taupo at the lake's northeastern edge, creates the Huka Falls, and flows northwest through the Waikato Plains. It empties into the Tasman Sea south of Auckland, at Port Waikato. It gives its name to the Waikato Region that surrounds the Waikato Plains. The present course of the river was largely formed about 17,000 years ago. Contributing factors were climate warming, forest being reestablished in the river headwaters and the deepening, rather than widening, of the existing river channel. The channel was gradually eroded as far up river as Piarere, leaving the old Hinuera channel high and dry.[2] The remains of the old river path can be clearly seen at Hinuera where the cliffs mark the ancient river ...
    what type of gas is produced during fermentation Fermentation Fermentation reacts NADH with an endogenous, organic electron acceptor.[1] Usually this is pyruvate formed from sugar through glycolysis. The reaction produces NAD+ and an organic product, typical examples being ethanol, lactic acid, carbon dioxide, and hydrogen gas (H2). However, more exotic compounds can be produced by fermentation, such as butyric acid and acetone. Fermentation products contain chemical energy (they are not fully oxidized), but are considered waste products, since they cannot be metabolized further without the use of oxygen.
    why was star wars episode iv released first Star Wars (film) Star Wars (later retitled Star Wars: Episode IV – A New Hope) is a 1977 American epic space opera film written and directed by George Lucas. It is the first film in the original Star Wars trilogy and the beginning of the Star Wars franchise. Starring Mark Hamill, Harrison Ford, Carrie Fisher, Peter Cushing, Alec Guinness, David Prowse, James Earl Jones, Anthony Daniels, Kenny Baker, and Peter Mayhew, the film's plot focuses on the Rebel Alliance, led by Princess Leia (Fisher), and its attempt to destroy the Galactic Empire's space station, the Death Star. This conflict disrupts the isolated life of farmhand Luke Skywalker (Hamill), who inadvertently acquires two droids that possess stolen architectural plans for the Death Star. When the Empire begins a destructive search for the missing droids, Skywalker accompanies Jedi Master Obi-Wan Kenobi (Guinness) on a mission to return the plans to the Rebel Alliance and rescue Leia from her imprisonment by the Empire.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • fp16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss all-nli-triplet loss stsb loss quora loss natural-questions loss
0.0243 100 2.8163 2.6011 4.6235 1.6762 2.2254
0.0487 200 2.6522 2.0674 4.5288 1.0381 1.7565
0.0730 300 2.5478 1.1872 5.1274 0.0883 0.8453
0.0973 400 2.3013 0.9126 5.3516 0.0443 0.6953
0.1217 500 1.9177 0.8462 5.6431 0.0343 0.5612
0.1460 600 1.7186 0.7144 5.8698 0.0264 0.3991
0.1703 700 2.0748 0.7219 5.2972 0.0255 0.2856
0.1946 800 1.9132 0.6691 5.3757 0.0196 0.2245
0.2190 900 1.8559 0.6198 5.5028 0.0185 0.1659
0.2433 1000 2.1453 0.5851 5.8587 0.0177 0.1280
0.2676 1100 2.0303 0.6331 5.1522 0.0222 0.1381
0.2920 1200 1.8612 0.5579 5.7026 0.0156 0.1016
0.3163 1300 1.8465 0.6045 5.0309 0.0187 0.1062
0.3406 1400 1.7208 0.5491 5.5651 0.0174 0.0864
0.3650 1500 1.5479 0.5337 5.9317 0.0170 0.0809
0.3893 1600 1.5605 0.5604 5.4574 0.0210 0.0765
0.4136 1700 1.7457 0.5528 5.2572 0.0188 0.0750
0.4380 1800 1.6724 0.4923 5.6488 0.0169 0.0790
0.4623 1900 1.4122 0.4718 5.3825 0.0163 0.0647
0.4866 2000 1.848 0.4594 5.6606 0.0189 0.0658
0.5109 2100 2.0782 0.5167 4.9055 0.0210 0.0712
0.5353 2200 1.5413 0.4396 5.3588 0.0210 0.0580
0.5596 2300 1.6705 0.4588 5.5433 0.0192 0.0550
0.5839 2400 1.5674 0.4351 5.3304 0.0180 0.0582
0.6083 2500 1.5238 0.4812 5.2534 0.0163 0.0530
0.6326 2600 1.4025 0.4470 5.4626 0.0156 0.0513
0.6569 2700 1.5916 0.4489 5.5590 0.0159 0.0513
0.6813 2800 1.6206 0.4611 5.1904 0.0156 0.0536
0.7056 2900 1.7873 0.4742 5.1292 0.0153 0.0472
0.7299 3000 1.9452 0.4752 4.9931 0.0163 0.0542
0.7543 3100 1.563 0.4722 5.3862 0.0175 0.0513
0.7786 3200 1.3493 0.4525 5.4255 0.0163 0.0423
0.8029 3300 1.606 0.4657 5.3005 0.0179 0.0431
0.8273 3400 1.6305 0.4466 5.5017 0.0163 0.0432
0.8516 3500 1.3496 0.4144 5.3454 0.0170 0.0440
0.8759 3600 1.5866 0.4014 5.8260 0.0167 0.0481
0.9002 3700 1.495 0.4094 5.5550 0.0173 0.0454
0.9246 3800 1.2604 0.4125 5.9704 0.0179 0.0376
0.9489 3900 1.6432 0.4223 5.1097 0.0176 0.0450
0.9732 4000 1.6194 0.4322 5.1807 0.0166 0.0400
0.9976 4100 1.3006 0.4209 5.3493 0.0176 0.0412
1.0219 4200 1.3557 0.4080 5.5556 0.0167 0.0395
1.0462 4300 1.2346 0.3944 5.6652 0.0164 0.0395
1.0706 4400 1.6212 0.4036 5.6948 0.0157 0.0407
1.0949 4500 1.7511 0.3909 5.5846 0.0159 0.0410
1.1192 4600 1.1087 0.3827 5.7067 0.0175 0.0384
1.1436 4700 1.1356 0.3947 6.0833 0.0181 0.0412
1.1679 4800 1.4649 0.3816 5.6926 0.0187 0.0407
1.1922 4900 1.2354 0.4000 5.8187 0.0181 0.0401
1.2165 5000 1.2099 0.3967 5.8184 0.0183 0.0428
1.2409 5100 1.279 0.3784 5.8931 0.0176 0.0418
1.2652 5200 1.0431 0.3845 5.8284 0.0167 0.0395
1.2895 5300 1.2217 0.3883 5.6984 0.0195 0.0380
1.3139 5400 1.6192 0.3858 5.7183 0.0192 0.0381
1.3382 5500 1.5792 0.3704 5.8270 0.0196 0.0437
1.3625 5600 1.4467 0.3885 5.7460 0.0179 0.0411
1.3869 5700 1.217 0.3778 5.6724 0.0185 0.0407
1.4112 5800 1.3599 0.3824 5.8521 0.0155 0.0392
1.4355 5900 1.3571 0.3674 6.0293 0.0158 0.0379
1.4599 6000 1.4408 0.3667 5.9265 0.0140 0.0379
1.4842 6100 1.1629 0.3612 5.6663 0.0151 0.0367
1.5085 6200 1.21 0.3765 5.7513 0.0176 0.0407
1.5328 6300 1.4469 0.3722 5.8795 0.0162 0.0431
1.5572 6400 1.8419 0.3687 5.6081 0.0145 0.0382
1.5815 6500 1.4978 0.3739 5.6302 0.0156 0.0372
1.6058 6600 1.3954 0.3658 5.9182 0.0160 0.0405
1.6302 6700 1.262 0.3702 5.6119 0.0158 0.0370
1.6545 6800 0.9204 0.3723 5.7449 0.0147 0.0378
1.6788 6900 1.0658 0.3738 5.7127 0.0132 0.0410
1.7032 7000 1.286 0.3740 5.7997 0.0143 0.0405
1.7275 7100 1.3771 0.3650 5.7853 0.0142 0.0411
1.7518 7200 1.205 0.3728 5.8454 0.0149 0.0423
1.7762 7300 0.9881 0.3691 5.7261 0.0147 0.0461
1.8005 7400 1.3962 0.3751 5.6620 0.0135 0.0427
1.8248 7500 1.1804 0.3812 5.6814 0.0136 0.0396
1.8491 7600 1.4312 0.3722 5.7919 0.0141 0.0368
1.8735 7700 1.1161 0.3700 5.7718 0.0140 0.0397
1.8978 7800 1.389 0.3815 5.8770 0.0127 0.0415
1.9221 7900 1.5896 0.3726 5.6467 0.0132 0.0382
1.9465 8000 1.6873 0.3706 5.5875 0.0132 0.0380
1.9708 8100 1.513 0.3658 5.6106 0.0130 0.0371
1.9951 8200 0.9243 0.3611 5.7932 0.0135 0.0378
2.0195 8300 1.1086 0.3510 5.8341 0.0133 0.0386
2.0438 8400 0.7918 0.3715 6.0229 0.0138 0.0382
2.0681 8500 1.1291 0.3708 6.0243 0.0146 0.0397
2.0925 8600 0.9846 0.3775 6.0437 0.0139 0.0380
2.1168 8700 0.7928 0.3732 6.1154 0.0145 0.0408
2.1411 8800 1.0726 0.3786 5.9249 0.0151 0.0387
2.1655 8900 1.3123 0.3720 6.0072 0.0146 0.0395
2.1898 9000 0.752 0.3741 6.1952 0.0148 0.0411
2.2141 9100 1.1021 0.3708 6.0910 0.0140 0.0391
2.2384 9200 0.8425 0.3646 6.1572 0.0150 0.0398
2.2628 9300 1.0123 0.3582 6.2371 0.0146 0.0399
2.2871 9400 1.0528 0.3742 6.2364 0.0142 0.0412
2.3114 9500 0.7329 0.3674 6.1969 0.0141 0.0439
2.3358 9600 1.2522 0.3667 6.2403 0.0140 0.0431
2.3601 9700 1.1872 0.3634 6.0391 0.0143 0.0430
2.3844 9800 1.0789 0.3698 6.0625 0.0132 0.0404
2.4088 9900 0.9211 0.3623 6.1184 0.0133 0.0421
2.4331 10000 0.957 0.3704 6.0958 0.0136 0.0412
2.4574 10100 1.0247 0.3665 6.0707 0.0131 0.0465
2.4818 10200 0.868 0.3684 6.0532 0.0130 0.0466
2.5061 10300 1.0651 0.3752 6.1146 0.0134 0.0463
2.5304 10400 0.8479 0.3751 6.1622 0.0132 0.0449
2.5547 10500 1.3458 0.3629 6.0291 0.0141 0.0449
2.5791 10600 1.0735 0.3683 5.9601 0.0139 0.0446
2.6034 10700 1.0609 0.3547 5.9667 0.0143 0.0410
2.6277 10800 0.8736 0.3676 6.0968 0.0137 0.0411
2.6521 10900 0.8848 0.3702 6.1259 0.0139 0.0406
2.6764 11000 0.8544 0.3751 6.1025 0.0142 0.0399
2.7007 11100 0.8619 0.3733 6.1460 0.0146 0.0388
2.7251 11200 0.8889 0.3770 6.1766 0.0148 0.0395
2.7494 11300 1.0385 0.3781 6.1172 0.0140 0.0405
2.7737 11400 0.811 0.3918 6.2225 0.0138 0.0389
2.7981 11500 0.9761 0.3834 6.1362 0.0142 0.0372
2.8224 11600 0.994 0.3791 6.2333 0.0139 0.0398
2.8467 11700 0.9336 0.3634 6.1495 0.0142 0.0397
2.8710 11800 0.9836 0.3719 6.1206 0.0141 0.0399
2.8954 11900 0.9395 0.3702 6.1925 0.0140 0.0413
2.9197 12000 1.0279 0.3718 6.1865 0.0138 0.0412
2.9440 12100 0.9084 0.3683 6.1300 0.0139 0.0423
2.9684 12200 0.7663 0.3692 6.2223 0.0140 0.0400
2.9927 12300 1.0803 0.3629 6.1623 0.0147 0.0413
3.0170 12400 0.6931 0.3709 6.2628 0.0151 0.0436
3.0414 12500 0.7655 0.3712 6.3208 0.0150 0.0428
3.0657 12600 0.7602 0.3779 6.4310 0.0139 0.0438
3.0900 12700 0.6897 0.3703 6.2320 0.0147 0.0427
3.1144 12800 0.7364 0.3815 6.3647 0.0147 0.0429
3.1387 12900 0.9105 0.3859 6.4185 0.0147 0.0429
3.1630 13000 0.5886 0.3845 6.3379 0.0149 0.0441
3.1873 13100 0.7225 0.3848 6.4305 0.0150 0.0455
3.2117 13200 0.771 0.3772 6.4205 0.0150 0.0452
3.2360 13300 0.7322 0.3790 6.3979 0.0148 0.0442
3.2603 13400 0.753 0.3744 6.4105 0.0152 0.0441
3.2847 13500 0.5427 0.3771 6.4288 0.0150 0.0459
3.3090 13600 0.7725 0.3727 6.3567 0.0152 0.0454
3.3333 13700 0.8041 0.3755 6.3754 0.0147 0.0456
3.3577 13800 0.6132 0.3804 6.4203 0.0151 0.0458
3.3820 13900 0.8572 0.3812 6.4300 0.0149 0.0461
3.4063 14000 0.5685 0.3845 6.4947 0.0147 0.0459
3.4307 14100 0.7893 0.3812 6.4488 0.0151 0.0468
3.4550 14200 0.6362 0.3857 6.4628 0.0153 0.0456
3.4793 14300 0.7303 0.3845 6.4720 0.0150 0.0462
3.5036 14400 0.5845 0.3881 6.4713 0.0149 0.0464
3.5280 14500 0.6069 0.3877 6.5055 0.0151 0.0454
3.5523 14600 0.6865 0.3816 6.4564 0.0149 0.0452
3.5766 14700 0.7699 0.3833 6.4560 0.0156 0.0462
3.6010 14800 0.923 0.3822 6.4682 0.0157 0.0464
3.6253 14900 0.737 0.3806 6.4656 0.0154 0.0462
3.6496 15000 0.7309 0.3853 6.4923 0.0152 0.0456
3.6740 15100 0.6811 0.3837 6.5052 0.0153 0.0458
3.6983 15200 0.5556 0.3848 6.5081 0.0151 0.0456
3.7226 15300 0.6696 0.3860 6.5200 0.0152 0.0459
3.7470 15400 0.6366 0.3864 6.5324 0.0150 0.0448
3.7713 15500 0.7848 0.3879 6.5547 0.0150 0.0448
3.7956 15600 0.8423 0.3861 6.5463 0.0151 0.0450
3.8200 15700 0.6599 0.3849 6.5421 0.0150 0.0451
3.8443 15800 0.5292 0.3851 6.5450 0.0150 0.0452
3.8686 15900 0.5983 0.3841 6.5396 0.0149 0.0450
3.8929 16000 0.5917 0.3823 6.5236 0.0149 0.0449
3.9173 16100 0.762 0.3825 6.5278 0.0150 0.0451
3.9416 16200 0.7396 0.3832 6.5380 0.0150 0.0453
3.9659 16300 0.574 0.3835 6.5399 0.0151 0.0452
3.9903 16400 0.5849 0.3835 6.5374 0.0151 0.0452

Framework Versions

  • Python: 3.10.10
  • Sentence Transformers: 3.4.0.dev0
  • Transformers: 4.49.0.dev0
  • PyTorch: 2.2.1+cu121
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
9
Safetensors
Model size
149M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for nickprock/modernbert-base-all-nli-stsb-quora-nq

Finetuned
(214)
this model

Datasets used to train nickprock/modernbert-base-all-nli-stsb-quora-nq