SentenceTransformer based on thenlper/gte-base

This is a sentence-transformers model finetuned from thenlper/gte-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: thenlper/gte-base
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("neel2306/gte-cp-base")
# Run inference
sentences = [
    'Mineral Fuels, Lubricants Etc.',
    'Crude oil',
    'Coal',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

Size: 10,932 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 3 tokens mean: 9.91 tokens max: 48 tokens	min: 3 tokens mean: 6.05 tokens max: 17 tokens	min: 3 tokens mean: 5.08 tokens max: 14 tokens

Samples:

anchor	positive	negative
`Clay Floor And Wall Tile, Glazed And Unglazed (Including Quarry Tile And Ceramic Mosaic Tile)`	`Ceramic mosaic tiles`	`Natural stone tiles`
`Electrical Relay/Conductor`	`Relay switches`	`Electrical insulators`
`Plasterer (Kelowna, British Columbia 5 13) (Union Rate)`	`Labor costs for plasterers`	`Painting supplies`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 2,733 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 3 tokens mean: 10.09 tokens max: 53 tokens	min: 3 tokens mean: 6.06 tokens max: 21 tokens	min: 3 tokens mean: 4.95 tokens max: 14 tokens

Samples:

anchor	positive	negative
`Asphalt Paving Mixture and Block Manufacturing`	`Recycled asphalt pavement (RAP)`	`Asphalt shingles`
`Air Conditioning Plant`	`Refrigerant gases`	`Heating elements`
`Oak Lumber`	`Oak plywood`	`Pine lumber`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 6e-05
num_train_epochs: 10
warmup_ratio: 0.1
optim: adamw_hf
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 6e-05
weight_decay: 0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 10
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_hf
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
eval_use_gather_object: False
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	loss
0.0731	50	1.9026	1.5169
0.1462	100	1.5479	1.0813
0.2193	150	1.0239	0.7291
0.2924	200	0.6914	0.6372
0.3655	250	0.653	0.5887
0.4386	300	0.5469	0.5605
0.5117	350	0.5312	0.5408
0.5848	400	0.4996	0.5100
0.6579	450	0.4445	0.4830
0.7310	500	0.5092	0.4734
0.8041	550	0.532	0.4476
0.8772	600	0.4147	0.4714
0.9503	650	0.477	0.4400
1.0234	700	0.4243	0.4466
1.0965	750	0.485	0.4172
1.1696	800	0.3717	0.4271
1.2427	850	0.3716	0.4369
1.3158	900	0.3742	0.4104
1.3889	950	0.3157	0.4436
1.4620	1000	0.3035	0.4444
1.5351	1050	0.2797	0.4558
1.6082	1100	0.2639	0.4248
1.6813	1150	0.2286	0.4308
1.7544	1200	0.2753	0.4098
1.8275	1250	0.1904	0.4415
1.9006	1300	0.2175	0.4503
1.9737	1350	0.1806	0.4245
2.0468	1400	0.1826	0.4418
2.1199	1450	0.1952	0.4138
2.1930	1500	0.1612	0.4061
2.2661	1550	0.1604	0.3910
2.3392	1600	0.1199	0.3852
2.4123	1650	0.1439	0.4082
2.4854	1700	0.1402	0.4352
2.5585	1750	0.1116	0.4338
2.6316	1800	0.1113	0.4189
2.7047	1850	0.1159	0.4013
2.7778	1900	0.1241	0.3853
2.8509	1950	0.0977	0.3919
2.9240	2000	0.0953	0.4022
2.9971	2050	0.1159	0.4073
3.0702	2100	0.0923	0.3903
3.1433	2150	0.0958	0.3833
3.2164	2200	0.0787	0.3875
3.2895	2250	0.083	0.3807
3.3626	2300	0.0714	0.3806
3.4357	2350	0.0748	0.3997
3.5088	2400	0.0779	0.4027
3.5819	2450	0.0709	0.3921
3.6550	2500	0.0482	0.3905
3.7281	2550	0.0784	0.3760
3.8012	2600	0.0694	0.3809
3.8743	2650	0.0725	0.3957
3.9474	2700	0.0718	0.3897
4.0205	2750	0.05	0.3894
4.0936	2800	0.0597	0.4014
4.1667	2850	0.0445	0.3929
4.2398	2900	0.039	0.3856
4.3129	2950	0.0405	0.3723
4.3860	3000	0.0456	0.3764
4.4591	3050	0.0493	0.3876
4.5322	3100	0.036	0.3866
4.6053	3150	0.0517	0.3791
4.6784	3200	0.0383	0.3724
4.7515	3250	0.0453	0.3886
4.8246	3300	0.0469	0.3897
4.8977	3350	0.0385	0.3940
4.9708	3400	0.0427	0.3877
5.0439	3450	0.0212	0.3914
5.1170	3500	0.0452	0.3899
5.1901	3550	0.0252	0.3925
5.2632	3600	0.0228	0.3895
5.3363	3650	0.0219	0.3792
5.4094	3700	0.0275	0.3882
5.4825	3750	0.0246	0.3892
5.5556	3800	0.0226	0.3895
5.6287	3850	0.0219	0.3912
5.7018	3900	0.027	0.3800
5.7749	3950	0.0268	0.3667
5.8480	4000	0.0313	0.3687
5.9211	4050	0.0233	0.3675
5.9942	4100	0.0201	0.3649
6.0673	4150	0.0207	0.3727
6.1404	4200	0.0175	0.3802
6.2135	4250	0.0117	0.3760
6.2865	4300	0.0124	0.3731
6.3596	4350	0.0164	0.3713
6.4327	4400	0.0149	0.3782
6.5058	4450	0.0127	0.3747
6.5789	4500	0.013	0.3746
6.6520	4550	0.0078	0.3756
6.7251	4600	0.0171	0.3741
6.7982	4650	0.0211	0.3680
6.8713	4700	0.0186	0.3686
6.9444	4750	0.0213	0.3688
7.0175	4800	0.0107	0.3647
7.0906	4850	0.011	0.3677
7.1637	4900	0.0098	0.3671
7.2368	4950	0.0091	0.3708
7.3099	5000	0.0074	0.3673
7.3830	5050	0.0101	0.3672
7.4561	5100	0.0115	0.3676
7.5292	5150	0.0054	0.3656
7.6023	5200	0.0076	0.3657
7.6754	5250	0.0054	0.3639
7.7485	5300	0.0115	0.3600
7.8216	5350	0.0105	0.3657
7.8947	5400	0.0175	0.3649
7.9678	5450	0.0091	0.3634
8.0409	5500	0.0043	0.3646
8.1140	5550	0.0078	0.3650
8.1871	5600	0.004	0.3683
8.2602	5650	0.0045	0.3669
8.3333	5700	0.005	0.3661
8.4064	5750	0.0074	0.3652
8.4795	5800	0.0042	0.3662
8.5526	5850	0.0039	0.3696
8.6257	5900	0.004	0.3724
8.6988	5950	0.008	0.3714
8.7719	6000	0.0057	0.3711
8.8450	6050	0.0045	0.3702
8.9181	6100	0.0122	0.3715
8.9912	6150	0.0064	0.3703
9.0643	6200	0.0039	0.3689
9.1374	6250	0.0034	0.3680
9.2105	6300	0.0022	0.3680
9.2836	6350	0.0021	0.3684
9.3567	6400	0.0025	0.3685
9.4298	6450	0.0041	0.3679
9.5029	6500	0.0018	0.3679
9.5760	6550	0.0039	0.3686
9.6491	6600	0.0021	0.3691
9.7222	6650	0.0056	0.3689
9.7953	6700	0.0025	0.3691
9.8684	6750	0.0063	0.3692
9.9415	6800	0.0074	0.3692

Framework Versions

Python: 3.12.6
Sentence Transformers: 3.1.0
Transformers: 4.44.2
PyTorch: 2.4.1+cpu
Accelerate: 0.34.2
Datasets: 3.0.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

neel2306
/

gte-cp-base

SentenceTransformer based on thenlper/gte-base

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Training Details

Training Dataset

Unnamed Dataset

Evaluation Dataset

Unnamed Dataset

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MultipleNegativesRankingLoss

Model tree for neel2306/gte-cp-base