SentenceTransformer based on answerdotai/ModernBERT-base

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-base on the all-nli-pair, all-nli-pair-class, all-nli-pair-score, all-nli-triplet, stsb, quora and natural-questions datasets. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: answerdotai/ModernBERT-base
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Datasets:
Language: en

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("nickprock/modernbert-base-all-nli-stsb-quora-nq")
# Run inference
sentences = [
    'There is a very full description of the various types of hormone rooting compound here.',
    'It is meant to stimulate root growth - in particular to stimulate the creation of roots.',
    "The least that can be said is that we must be born with the ability and 'knowledge' to learn.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Datasets

all-nli-pair

Dataset: all-nli-pair at d482672
Size: 10,000 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 5 tokens
mean: 17.29 tokens
max: 64 tokens

min: 5 tokens
mean: 9.7 tokens
max: 31 tokens

	anchor	positive
type	string	string
details	min: 5 tokens mean: 17.29 tokens max: 64 tokens	min: 5 tokens mean: 9.7 tokens max: 31 tokens

Samples:

anchor	positive
`A person on a horse jumps over a broken down airplane.`	`A person is outdoors, on a horse.`
`Children smiling and waving at camera`	`There are children present`
`A boy is jumping on skateboard in the middle of a red bridge.`	`The boy does a skateboarding trick.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

all-nli-pair-class

Dataset: all-nli-pair-class at d482672
Size: 10,000 training samples
Columns: premise, hypothesis, and label
Approximate statistics based on the first 1000 samples:
premise hypothesis label
type string string int
details
min: 6 tokens
mean: 17.6 tokens
max: 51 tokens

min: 5 tokens
mean: 10.8 tokens
max: 33 tokens

0: ~33.40%
1: ~33.30%
2: ~33.30%

	premise	hypothesis	label
type	string	string	int
details	min: 6 tokens mean: 17.6 tokens max: 51 tokens	min: 5 tokens mean: 10.8 tokens max: 33 tokens	0: ~33.40% 1: ~33.30% 2: ~33.30%

Samples:

premise	hypothesis	label
`A person on a horse jumps over a broken down airplane.`	`A person is training his horse for a competition.`	`1`
`A person on a horse jumps over a broken down airplane.`	`A person is at a diner, ordering an omelette.`	`2`
`A person on a horse jumps over a broken down airplane.`	`A person is outdoors, on a horse.`	`0`

Loss: SoftmaxLoss

all-nli-pair-score

Dataset: all-nli-pair-score at d482672
Size: 10,000 training samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 6 tokens
mean: 17.6 tokens
max: 51 tokens

min: 5 tokens
mean: 10.8 tokens
max: 33 tokens

min: 0.0
mean: 0.5
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 6 tokens mean: 17.6 tokens max: 51 tokens	min: 5 tokens mean: 10.8 tokens max: 33 tokens	min: 0.0 mean: 0.5 max: 1.0

Samples:

sentence1	sentence2	score
`A person on a horse jumps over a broken down airplane.`	`A person is training his horse for a competition.`	`0.5`
`A person on a horse jumps over a broken down airplane.`	`A person is at a diner, ordering an omelette.`	`0.0`
`A person on a horse jumps over a broken down airplane.`	`A person is outdoors, on a horse.`	`1.0`

Loss: CoSENTLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_cos_sim"
}

all-nli-triplet

Dataset: all-nli-triplet at d482672
Size: 10,000 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 7 tokens mean: 10.46 tokens max: 46 tokens	min: 6 tokens mean: 12.91 tokens max: 40 tokens	min: 5 tokens mean: 13.49 tokens max: 51 tokens

Samples:

anchor	positive	negative
`A person on a horse jumps over a broken down airplane.`	`A person is outdoors, on a horse.`	`A person is at a diner, ordering an omelette.`
`Children smiling and waving at camera`	`There are children present`	`The kids are frowning`
`A boy is jumping on skateboard in the middle of a red bridge.`	`The boy does a skateboarding trick.`	`The boy skates down the sidewalk.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

stsb

Dataset: stsb at ab7a5ac
Size: 5,749 training samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 6 tokens
mean: 10.16 tokens
max: 28 tokens

min: 6 tokens
mean: 10.12 tokens
max: 25 tokens

min: 0.0
mean: 0.45
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 6 tokens mean: 10.16 tokens max: 28 tokens	min: 6 tokens mean: 10.12 tokens max: 25 tokens	min: 0.0 mean: 0.45 max: 1.0

Samples:

sentence1	sentence2	score
`A plane is taking off.`	`An air plane is taking off.`	`1.0`
`A man is playing a large flute.`	`A man is playing a flute.`	`0.76`
`A man is spreading shreded cheese on a pizza.`	`A man is spreading shredded cheese on an uncooked pizza.`	`0.76`

Loss: CoSENTLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_cos_sim"
}

quora

Dataset: quora at 451a485
Size: 10,000 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 6 tokens
mean: 13.91 tokens
max: 45 tokens

min: 6 tokens
mean: 14.09 tokens
max: 44 tokens

	anchor	positive
type	string	string
details	min: 6 tokens mean: 13.91 tokens max: 45 tokens	min: 6 tokens mean: 14.09 tokens max: 44 tokens

Samples:

anchor	positive
`Astrology: I am a Capricorn Sun Cap moon and cap rising...what does that say about me?`	`I'm a triple Capricorn (Sun, Moon and ascendant in Capricorn) What does this say about me?`
`How can I be a good geologist?`	`What should I do to be a great geologist?`
`How do I read and find my YouTube comments?`	`How can I see all my Youtube comments?`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

natural-questions

Dataset: natural-questions at f9e894e
Size: 10,000 training samples
Columns: query and answer
Approximate statistics based on the first 1000 samples:
query answer
type string string
details
min: 10 tokens
mean: 12.47 tokens
max: 23 tokens

min: 17 tokens
mean: 138.32 tokens
max: 556 tokens

	query	answer
type	string	string
details	min: 10 tokens mean: 12.47 tokens max: 23 tokens	min: 17 tokens mean: 138.32 tokens max: 556 tokens

Samples:

query	answer
`when did richmond last play in a preliminary final`	Richmond Football Club Richmond began 2017 with 5 straight wins, a feat it had not achieved since 1995. A series of close losses hampered the Tigers throughout the middle of the season, including a 5-point loss to the Western Bulldogs, 2-point loss to Fremantle, and a 3-point loss to the Giants. Richmond ended the season strongly with convincing victories over Fremantle and St Kilda in the final two rounds, elevating the club to 3rd on the ladder. Richmond's first final of the season against the Cats at the MCG attracted a record qualifying final crowd of 95,028; the Tigers won by 51 points. Having advanced to the first preliminary finals for the first time since 2001, Richmond defeated Greater Western Sydney by 36 points in front of a crowd of 94,258 to progress to the Grand Final against Adelaide, their first Grand Final appearance since 1982. The attendance was 100,021, the largest crowd to a grand final since 1986. The Crows led at quarter time and led by as many as 13, but the Tig...
`who sang what in the world's come over you`	Jack Scott (singer) At the beginning of 1960, Scott again changed record labels, this time to Top Rank Records.[1] He then recorded four Billboard Hot 100 hits – "What in the World's Come Over You" (#5), "Burning Bridges" (#3) b/w "Oh Little One" (#34), and "It Only Happened Yesterday" (#38).[1] "What in the World's Come Over You" was Scott's second gold disc winner.[6] Scott continued to record and perform during the 1960s and 1970s.[1] His song "You're Just Gettin' Better" reached the country charts in 1974.[1] In May 1977, Scott recorded a Peel session for BBC Radio 1 disc jockey, John Peel.
`who produces the most wool in the world`	Wool Global wool production is about 2 million tonnes per year, of which 60% goes into apparel. Wool comprises ca 3% of the global textile market, but its value is higher owing to dying and other modifications of the material.[1] Australia is a leading producer of wool which is mostly from Merino sheep but has been eclipsed by China in terms of total weight.[30] New Zealand (2016) is the third-largest producer of wool, and the largest producer of crossbred wool. Breeds such as Lincoln, Romney, Drysdale, and Elliotdale produce coarser fibers, and wool from these sheep is usually used for making carpets.

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Datasets

all-nli-triplet

Dataset: all-nli-triplet at d482672
Size: 6,584 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 6 tokens mean: 18.25 tokens max: 69 tokens	min: 5 tokens mean: 9.88 tokens max: 30 tokens	min: 5 tokens mean: 10.48 tokens max: 29 tokens

Samples:

anchor	positive	negative
`Two women are embracing while holding to go packages.`	`Two woman are holding packages.`	`The men are fighting outside a deli.`
`Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.`	`Two kids in numbered jerseys wash their hands.`	`Two kids in jackets walk to school.`
`A man selling donuts to a customer during a world exhibition event held in the city of Angeles`	`A man selling donuts to a customer.`	`A woman drinks her coffee in a small cafe.`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

stsb

Dataset: stsb at ab7a5ac
Size: 1,500 evaluation samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 5 tokens
mean: 15.11 tokens
max: 44 tokens

min: 6 tokens
mean: 15.1 tokens
max: 50 tokens

min: 0.0
mean: 0.42
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 5 tokens mean: 15.11 tokens max: 44 tokens	min: 6 tokens mean: 15.1 tokens max: 50 tokens	min: 0.0 mean: 0.42 max: 1.0

Samples:

sentence1	sentence2	score
`A man with a hard hat is dancing.`	`A man wearing a hard hat is dancing.`	`1.0`
`A young child is riding a horse.`	`A child is riding a horse.`	`0.95`
`A man is feeding a mouse to a snake.`	`The man is feeding a mouse to the snake.`	`1.0`

Loss: CoSENTLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "pairwise_cos_sim"
}

quora

Dataset: quora at 451a485
Size: 1,000 evaluation samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 6 tokens
mean: 14.01 tokens
max: 63 tokens

min: 6 tokens
mean: 14.04 tokens
max: 46 tokens

	anchor	positive
type	string	string
details	min: 6 tokens mean: 14.01 tokens max: 63 tokens	min: 6 tokens mean: 14.04 tokens max: 46 tokens

Samples:

anchor	positive
`What is your New Year resolution?`	`What can be my new year resolution for 2017?`
`Should I buy the IPhone 6s or Samsung Galaxy s7?`	`Which is better: the iPhone 6S Plus or the Samsung Galaxy S7 Edge?`
`What are the differences between transgression and regression?`	`What is the difference between transgression and regression?`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

natural-questions

Dataset: natural-questions at f9e894e
Size: 1,000 evaluation samples
Columns: query and answer
Approximate statistics based on the first 1000 samples:
query answer
type string string
details
min: 9 tokens
mean: 12.51 tokens
max: 26 tokens

min: 19 tokens
mean: 140.84 tokens
max: 585 tokens

	query	answer
type	string	string
details	min: 9 tokens mean: 12.51 tokens max: 26 tokens	min: 19 tokens mean: 140.84 tokens max: 585 tokens

Samples:

query	answer
`where does the waikato river begin and end`	Waikato River The Waikato River is the longest river in New Zealand, running for 425 kilometres (264Â mi) through the North Island. It rises in the eastern slopes of Mount Ruapehu, joining the Tongariro River system and flowing through Lake Taupo, New Zealand's largest lake. It then drains Taupo at the lake's northeastern edge, creates the Huka Falls, and flows northwest through the Waikato Plains. It empties into the Tasman Sea south of Auckland, at Port Waikato. It gives its name to the Waikato Region that surrounds the Waikato Plains. The present course of the river was largely formed about 17,000 years ago. Contributing factors were climate warming, forest being reestablished in the river headwaters and the deepening, rather than widening, of the existing river channel. The channel was gradually eroded as far up river as Piarere, leaving the old Hinuera channel high and dry.[2] The remains of the old river path can be clearly seen at Hinuera where the cliffs mark the ancient river ...
`what type of gas is produced during fermentation`	Fermentation Fermentation reacts NADH with an endogenous, organic electron acceptor.[1] Usually this is pyruvate formed from sugar through glycolysis. The reaction produces NAD+ and an organic product, typical examples being ethanol, lactic acid, carbon dioxide, and hydrogen gas (H2). However, more exotic compounds can be produced by fermentation, such as butyric acid and acetone. Fermentation products contain chemical energy (they are not fully oxidized), but are considered waste products, since they cannot be metabolized further without the use of oxygen.
`why was star wars episode iv released first`	Star Wars (film) Star Wars (later retitled Star Wars: Episode IV â€“ A New Hope) is a 1977 American epic space opera film written and directed by George Lucas. It is the first film in the original Star Wars trilogy and the beginning of the Star Wars franchise. Starring Mark Hamill, Harrison Ford, Carrie Fisher, Peter Cushing, Alec Guinness, David Prowse, James Earl Jones, Anthony Daniels, Kenny Baker, and Peter Mayhew, the film's plot focuses on the Rebel Alliance, led by Princess Leia (Fisher), and its attempt to destroy the Galactic Empire's space station, the Death Star. This conflict disrupts the isolated life of farmhand Luke Skywalker (Hamill), who inadvertently acquires two droids that possess stolen architectural plans for the Death Star. When the Empire begins a destructive search for the missing droids, Skywalker accompanies Jedi Master Obi-Wan Kenobi (Guinness) on a mission to return the plans to the Rebel Alliance and rescue Leia from her imprisonment by the Empire.

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 4
warmup_ratio: 0.1
fp16: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 4
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: True
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	all-nli-triplet loss	stsb loss	quora loss	natural-questions loss
0.0243	100	2.8163	2.6011	4.6235	1.6762	2.2254
0.0487	200	2.6522	2.0674	4.5288	1.0381	1.7565
0.0730	300	2.5478	1.1872	5.1274	0.0883	0.8453
0.0973	400	2.3013	0.9126	5.3516	0.0443	0.6953
0.1217	500	1.9177	0.8462	5.6431	0.0343	0.5612
0.1460	600	1.7186	0.7144	5.8698	0.0264	0.3991
0.1703	700	2.0748	0.7219	5.2972	0.0255	0.2856
0.1946	800	1.9132	0.6691	5.3757	0.0196	0.2245
0.2190	900	1.8559	0.6198	5.5028	0.0185	0.1659
0.2433	1000	2.1453	0.5851	5.8587	0.0177	0.1280
0.2676	1100	2.0303	0.6331	5.1522	0.0222	0.1381
0.2920	1200	1.8612	0.5579	5.7026	0.0156	0.1016
0.3163	1300	1.8465	0.6045	5.0309	0.0187	0.1062
0.3406	1400	1.7208	0.5491	5.5651	0.0174	0.0864
0.3650	1500	1.5479	0.5337	5.9317	0.0170	0.0809
0.3893	1600	1.5605	0.5604	5.4574	0.0210	0.0765
0.4136	1700	1.7457	0.5528	5.2572	0.0188	0.0750
0.4380	1800	1.6724	0.4923	5.6488	0.0169	0.0790
0.4623	1900	1.4122	0.4718	5.3825	0.0163	0.0647
0.4866	2000	1.848	0.4594	5.6606	0.0189	0.0658
0.5109	2100	2.0782	0.5167	4.9055	0.0210	0.0712
0.5353	2200	1.5413	0.4396	5.3588	0.0210	0.0580
0.5596	2300	1.6705	0.4588	5.5433	0.0192	0.0550
0.5839	2400	1.5674	0.4351	5.3304	0.0180	0.0582
0.6083	2500	1.5238	0.4812	5.2534	0.0163	0.0530
0.6326	2600	1.4025	0.4470	5.4626	0.0156	0.0513
0.6569	2700	1.5916	0.4489	5.5590	0.0159	0.0513
0.6813	2800	1.6206	0.4611	5.1904	0.0156	0.0536
0.7056	2900	1.7873	0.4742	5.1292	0.0153	0.0472
0.7299	3000	1.9452	0.4752	4.9931	0.0163	0.0542
0.7543	3100	1.563	0.4722	5.3862	0.0175	0.0513
0.7786	3200	1.3493	0.4525	5.4255	0.0163	0.0423
0.8029	3300	1.606	0.4657	5.3005	0.0179	0.0431
0.8273	3400	1.6305	0.4466	5.5017	0.0163	0.0432
0.8516	3500	1.3496	0.4144	5.3454	0.0170	0.0440
0.8759	3600	1.5866	0.4014	5.8260	0.0167	0.0481
0.9002	3700	1.495	0.4094	5.5550	0.0173	0.0454
0.9246	3800	1.2604	0.4125	5.9704	0.0179	0.0376
0.9489	3900	1.6432	0.4223	5.1097	0.0176	0.0450
0.9732	4000	1.6194	0.4322	5.1807	0.0166	0.0400
0.9976	4100	1.3006	0.4209	5.3493	0.0176	0.0412
1.0219	4200	1.3557	0.4080	5.5556	0.0167	0.0395
1.0462	4300	1.2346	0.3944	5.6652	0.0164	0.0395
1.0706	4400	1.6212	0.4036	5.6948	0.0157	0.0407
1.0949	4500	1.7511	0.3909	5.5846	0.0159	0.0410
1.1192	4600	1.1087	0.3827	5.7067	0.0175	0.0384
1.1436	4700	1.1356	0.3947	6.0833	0.0181	0.0412
1.1679	4800	1.4649	0.3816	5.6926	0.0187	0.0407
1.1922	4900	1.2354	0.4000	5.8187	0.0181	0.0401
1.2165	5000	1.2099	0.3967	5.8184	0.0183	0.0428
1.2409	5100	1.279	0.3784	5.8931	0.0176	0.0418
1.2652	5200	1.0431	0.3845	5.8284	0.0167	0.0395
1.2895	5300	1.2217	0.3883	5.6984	0.0195	0.0380
1.3139	5400	1.6192	0.3858	5.7183	0.0192	0.0381
1.3382	5500	1.5792	0.3704	5.8270	0.0196	0.0437
1.3625	5600	1.4467	0.3885	5.7460	0.0179	0.0411
1.3869	5700	1.217	0.3778	5.6724	0.0185	0.0407
1.4112	5800	1.3599	0.3824	5.8521	0.0155	0.0392
1.4355	5900	1.3571	0.3674	6.0293	0.0158	0.0379
1.4599	6000	1.4408	0.3667	5.9265	0.0140	0.0379
1.4842	6100	1.1629	0.3612	5.6663	0.0151	0.0367
1.5085	6200	1.21	0.3765	5.7513	0.0176	0.0407
1.5328	6300	1.4469	0.3722	5.8795	0.0162	0.0431
1.5572	6400	1.8419	0.3687	5.6081	0.0145	0.0382
1.5815	6500	1.4978	0.3739	5.6302	0.0156	0.0372
1.6058	6600	1.3954	0.3658	5.9182	0.0160	0.0405
1.6302	6700	1.262	0.3702	5.6119	0.0158	0.0370
1.6545	6800	0.9204	0.3723	5.7449	0.0147	0.0378
1.6788	6900	1.0658	0.3738	5.7127	0.0132	0.0410
1.7032	7000	1.286	0.3740	5.7997	0.0143	0.0405
1.7275	7100	1.3771	0.3650	5.7853	0.0142	0.0411
1.7518	7200	1.205	0.3728	5.8454	0.0149	0.0423
1.7762	7300	0.9881	0.3691	5.7261	0.0147	0.0461
1.8005	7400	1.3962	0.3751	5.6620	0.0135	0.0427
1.8248	7500	1.1804	0.3812	5.6814	0.0136	0.0396
1.8491	7600	1.4312	0.3722	5.7919	0.0141	0.0368
1.8735	7700	1.1161	0.3700	5.7718	0.0140	0.0397
1.8978	7800	1.389	0.3815	5.8770	0.0127	0.0415
1.9221	7900	1.5896	0.3726	5.6467	0.0132	0.0382
1.9465	8000	1.6873	0.3706	5.5875	0.0132	0.0380
1.9708	8100	1.513	0.3658	5.6106	0.0130	0.0371
1.9951	8200	0.9243	0.3611	5.7932	0.0135	0.0378
2.0195	8300	1.1086	0.3510	5.8341	0.0133	0.0386
2.0438	8400	0.7918	0.3715	6.0229	0.0138	0.0382
2.0681	8500	1.1291	0.3708	6.0243	0.0146	0.0397
2.0925	8600	0.9846	0.3775	6.0437	0.0139	0.0380
2.1168	8700	0.7928	0.3732	6.1154	0.0145	0.0408
2.1411	8800	1.0726	0.3786	5.9249	0.0151	0.0387
2.1655	8900	1.3123	0.3720	6.0072	0.0146	0.0395
2.1898	9000	0.752	0.3741	6.1952	0.0148	0.0411
2.2141	9100	1.1021	0.3708	6.0910	0.0140	0.0391
2.2384	9200	0.8425	0.3646	6.1572	0.0150	0.0398
2.2628	9300	1.0123	0.3582	6.2371	0.0146	0.0399
2.2871	9400	1.0528	0.3742	6.2364	0.0142	0.0412
2.3114	9500	0.7329	0.3674	6.1969	0.0141	0.0439
2.3358	9600	1.2522	0.3667	6.2403	0.0140	0.0431
2.3601	9700	1.1872	0.3634	6.0391	0.0143	0.0430
2.3844	9800	1.0789	0.3698	6.0625	0.0132	0.0404
2.4088	9900	0.9211	0.3623	6.1184	0.0133	0.0421
2.4331	10000	0.957	0.3704	6.0958	0.0136	0.0412
2.4574	10100	1.0247	0.3665	6.0707	0.0131	0.0465
2.4818	10200	0.868	0.3684	6.0532	0.0130	0.0466
2.5061	10300	1.0651	0.3752	6.1146	0.0134	0.0463
2.5304	10400	0.8479	0.3751	6.1622	0.0132	0.0449
2.5547	10500	1.3458	0.3629	6.0291	0.0141	0.0449
2.5791	10600	1.0735	0.3683	5.9601	0.0139	0.0446
2.6034	10700	1.0609	0.3547	5.9667	0.0143	0.0410
2.6277	10800	0.8736	0.3676	6.0968	0.0137	0.0411
2.6521	10900	0.8848	0.3702	6.1259	0.0139	0.0406
2.6764	11000	0.8544	0.3751	6.1025	0.0142	0.0399
2.7007	11100	0.8619	0.3733	6.1460	0.0146	0.0388
2.7251	11200	0.8889	0.3770	6.1766	0.0148	0.0395
2.7494	11300	1.0385	0.3781	6.1172	0.0140	0.0405
2.7737	11400	0.811	0.3918	6.2225	0.0138	0.0389
2.7981	11500	0.9761	0.3834	6.1362	0.0142	0.0372
2.8224	11600	0.994	0.3791	6.2333	0.0139	0.0398
2.8467	11700	0.9336	0.3634	6.1495	0.0142	0.0397
2.8710	11800	0.9836	0.3719	6.1206	0.0141	0.0399
2.8954	11900	0.9395	0.3702	6.1925	0.0140	0.0413
2.9197	12000	1.0279	0.3718	6.1865	0.0138	0.0412
2.9440	12100	0.9084	0.3683	6.1300	0.0139	0.0423
2.9684	12200	0.7663	0.3692	6.2223	0.0140	0.0400
2.9927	12300	1.0803	0.3629	6.1623	0.0147	0.0413
3.0170	12400	0.6931	0.3709	6.2628	0.0151	0.0436
3.0414	12500	0.7655	0.3712	6.3208	0.0150	0.0428
3.0657	12600	0.7602	0.3779	6.4310	0.0139	0.0438
3.0900	12700	0.6897	0.3703	6.2320	0.0147	0.0427
3.1144	12800	0.7364	0.3815	6.3647	0.0147	0.0429
3.1387	12900	0.9105	0.3859	6.4185	0.0147	0.0429
3.1630	13000	0.5886	0.3845	6.3379	0.0149	0.0441
3.1873	13100	0.7225	0.3848	6.4305	0.0150	0.0455
3.2117	13200	0.771	0.3772	6.4205	0.0150	0.0452
3.2360	13300	0.7322	0.3790	6.3979	0.0148	0.0442
3.2603	13400	0.753	0.3744	6.4105	0.0152	0.0441
3.2847	13500	0.5427	0.3771	6.4288	0.0150	0.0459
3.3090	13600	0.7725	0.3727	6.3567	0.0152	0.0454
3.3333	13700	0.8041	0.3755	6.3754	0.0147	0.0456
3.3577	13800	0.6132	0.3804	6.4203	0.0151	0.0458
3.3820	13900	0.8572	0.3812	6.4300	0.0149	0.0461
3.4063	14000	0.5685	0.3845	6.4947	0.0147	0.0459
3.4307	14100	0.7893	0.3812	6.4488	0.0151	0.0468
3.4550	14200	0.6362	0.3857	6.4628	0.0153	0.0456
3.4793	14300	0.7303	0.3845	6.4720	0.0150	0.0462
3.5036	14400	0.5845	0.3881	6.4713	0.0149	0.0464
3.5280	14500	0.6069	0.3877	6.5055	0.0151	0.0454
3.5523	14600	0.6865	0.3816	6.4564	0.0149	0.0452
3.5766	14700	0.7699	0.3833	6.4560	0.0156	0.0462
3.6010	14800	0.923	0.3822	6.4682	0.0157	0.0464
3.6253	14900	0.737	0.3806	6.4656	0.0154	0.0462
3.6496	15000	0.7309	0.3853	6.4923	0.0152	0.0456
3.6740	15100	0.6811	0.3837	6.5052	0.0153	0.0458
3.6983	15200	0.5556	0.3848	6.5081	0.0151	0.0456
3.7226	15300	0.6696	0.3860	6.5200	0.0152	0.0459
3.7470	15400	0.6366	0.3864	6.5324	0.0150	0.0448
3.7713	15500	0.7848	0.3879	6.5547	0.0150	0.0448
3.7956	15600	0.8423	0.3861	6.5463	0.0151	0.0450
3.8200	15700	0.6599	0.3849	6.5421	0.0150	0.0451
3.8443	15800	0.5292	0.3851	6.5450	0.0150	0.0452
3.8686	15900	0.5983	0.3841	6.5396	0.0149	0.0450
3.8929	16000	0.5917	0.3823	6.5236	0.0149	0.0449
3.9173	16100	0.762	0.3825	6.5278	0.0150	0.0451
3.9416	16200	0.7396	0.3832	6.5380	0.0150	0.0453
3.9659	16300	0.574	0.3835	6.5399	0.0151	0.0452
3.9903	16400	0.5849	0.3835	6.5374	0.0151	0.0452

Framework Versions

Python: 3.10.10
Sentence Transformers: 3.4.0.dev0
Transformers: 4.49.0.dev0
PyTorch: 2.2.1+cu121
Accelerate: 1.3.0
Datasets: 3.2.0
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}

nickprock
/

modernbert-base-all-nli-stsb-quora-nq

SentenceTransformer based on answerdotai/ModernBERT-base

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Training Details

Training Datasets

all-nli-pair

all-nli-pair-class

all-nli-pair-score

all-nli-triplet

stsb

quora

natural-questions

Evaluation Datasets

all-nli-triplet

stsb

quora

natural-questions

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers and SoftmaxLoss

MultipleNegativesRankingLoss

CoSENTLoss

Model tree for nickprock/modernbert-base-all-nli-stsb-quora-nq

Datasets used to train nickprock/modernbert-base-all-nli-stsb-quora-nq