BGE base Financial Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: BAAI/bge-base-en-v1.5
Maximum Sequence Length: 512 tokens
Output Dimensionality: 768 tokens
Similarity Function: Cosine Similarity
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("joshuapb/fine-tuned-matryoshka")
# Run inference
sentences = [
    'Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”, “highest”), such as "Confidence: 60% / Medium".\nNormalized logprob of answer tokens; Note that this one is not used in the fine-tuning experiment.\nLogprob of an indirect "True/False" token after the raw answer.\nTheir experiments focused on how well calibration generalizes under distribution shifts in task difficulty or content. Each fine-tuning datapoint is a question, the model’s answer (possibly incorrect), and a calibrated confidence. Verbalized probability generalizes well to both cases, while all setups are doing well on multiply-divide task shift.  Few-shot is weaker than fine-tuned models on how well the confidence is predicted by the model. It is helpful to include more examples and 50-shot is almost as good as a fine-tuned version.',
    'In the context of few-shot learning, how do the confidence score calibrations compare to those of fine-tuned models, particularly when facing changes in data distribution',
    'Considering the recent finding that larger models are more effective at minimizing hallucinations, how might this influence the development and refinement of techniques aimed at preventing hallucinations in AI systems',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Dataset: dim_768
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.9208
cosine_accuracy@3	0.995
cosine_accuracy@5	0.995
cosine_accuracy@10	1.0
cosine_precision@1	0.9208
cosine_precision@3	0.3317
cosine_precision@5	0.199
cosine_precision@10	0.1
cosine_recall@1	0.9208
cosine_recall@3	0.995
cosine_recall@5	0.995
cosine_recall@10	1.0
cosine_ndcg@10	0.9694
cosine_mrr@10	0.9587
cosine_map@100	0.9587

Information Retrieval

Dataset: dim_512
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.9257
cosine_accuracy@3	0.995
cosine_accuracy@5	1.0
cosine_accuracy@10	1.0
cosine_precision@1	0.9257
cosine_precision@3	0.3317
cosine_precision@5	0.2
cosine_precision@10	0.1
cosine_recall@1	0.9257
cosine_recall@3	0.995
cosine_recall@5	1.0
cosine_recall@10	1.0
cosine_ndcg@10	0.9716
cosine_mrr@10	0.9616
cosine_map@100	0.9616

Information Retrieval

Dataset: dim_256
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.9158
cosine_accuracy@3	1.0
cosine_accuracy@5	1.0
cosine_accuracy@10	1.0
cosine_precision@1	0.9158
cosine_precision@3	0.3333
cosine_precision@5	0.2
cosine_precision@10	0.1
cosine_recall@1	0.9158
cosine_recall@3	1.0
cosine_recall@5	1.0
cosine_recall@10	1.0
cosine_ndcg@10	0.9676
cosine_mrr@10	0.9563
cosine_map@100	0.9563

Information Retrieval

Dataset: dim_128
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.9158
cosine_accuracy@3	0.995
cosine_accuracy@5	1.0
cosine_accuracy@10	1.0
cosine_precision@1	0.9158
cosine_precision@3	0.3317
cosine_precision@5	0.2
cosine_precision@10	0.1
cosine_recall@1	0.9158
cosine_recall@3	0.995
cosine_recall@5	1.0
cosine_recall@10	1.0
cosine_ndcg@10	0.9677
cosine_mrr@10	0.9564
cosine_map@100	0.9564

Information Retrieval

Dataset: dim_64
Evaluated with InformationRetrievalEvaluator

Metric	Value
cosine_accuracy@1	0.901
cosine_accuracy@3	1.0
cosine_accuracy@5	1.0
cosine_accuracy@10	1.0
cosine_precision@1	0.901
cosine_precision@3	0.3333
cosine_precision@5	0.2
cosine_precision@10	0.1
cosine_recall@1	0.901
cosine_recall@3	1.0
cosine_recall@5	1.0
cosine_recall@10	1.0
cosine_ndcg@10	0.9622
cosine_mrr@10	0.9488
cosine_map@100	0.9488

Training Details

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: epoch
per_device_eval_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 5
lr_scheduler_type: cosine
warmup_ratio: 0.1
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: epoch
prediction_loss_only: True
per_device_train_batch_size: 8
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 5
max_steps: -1
lr_scheduler_type: cosine
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: False
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Click to expand

Epoch	Step	Training Loss	dim_128_cosine_map@100	dim_256_cosine_map@100	dim_512_cosine_map@100	dim_64_cosine_map@100	dim_768_cosine_map@100
0.0220	5	6.6173	-	-	-	-	-
0.0441	10	5.5321	-	-	-	-	-
0.0661	15	5.656	-	-	-	-	-
0.0881	20	4.9256	-	-	-	-	-
0.1101	25	5.0757	-	-	-	-	-
0.1322	30	5.2047	-	-	-	-	-
0.1542	35	5.1307	-	-	-	-	-
0.1762	40	4.9219	-	-	-	-	-
0.1982	45	5.1957	-	-	-	-	-
0.2203	50	5.36	-	-	-	-	-
0.2423	55	3.0865	-	-	-	-	-
0.2643	60	3.7054	-	-	-	-	-
0.2863	65	2.9541	-	-	-	-	-
0.3084	70	3.5521	-	-	-	-	-
0.3304	75	3.5665	-	-	-	-	-
0.3524	80	2.9532	-	-	-	-	-
0.3744	85	2.5121	-	-	-	-	-
0.3965	90	3.1269	-	-	-	-	-
0.4185	95	3.4048	-	-	-	-	-
0.4405	100	2.8126	-	-	-	-	-
0.4626	105	1.6847	-	-	-	-	-
0.4846	110	1.3331	-	-	-	-	-
0.5066	115	2.4799	-	-	-	-	-
0.5286	120	2.1176	-	-	-	-	-
0.5507	125	2.4249	-	-	-	-	-
0.5727	130	3.3705	-	-	-	-	-
0.5947	135	1.551	-	-	-	-	-
0.6167	140	1.328	-	-	-	-	-
0.6388	145	1.9353	-	-	-	-	-
0.6608	150	2.4254	-	-	-	-	-
0.6828	155	1.8436	-	-	-	-	-
0.7048	160	1.1937	-	-	-	-	-
0.7269	165	2.164	-	-	-	-	-
0.7489	170	2.2921	-	-	-	-	-
0.7709	175	2.4385	-	-	-	-	-
0.7930	180	1.2392	-	-	-	-	-
0.8150	185	1.0472	-	-	-	-	-
0.8370	190	1.5844	-	-	-	-	-
0.8590	195	1.2492	-	-	-	-	-
0.8811	200	1.6774	-	-	-	-	-
0.9031	205	2.485	-	-	-	-	-
0.9251	210	2.4781	-	-	-	-	-
0.9471	215	2.4476	-	-	-	-	-
0.9692	220	2.6243	-	-	-	-	-
0.9912	225	1.3651	-	-	-	-	-
1.0	227	-	0.9066	0.9112	0.9257	0.8906	0.9182
1.0132	230	1.0575	-	-	-	-	-
1.0352	235	1.4499	-	-	-	-	-
1.0573	240	1.4333	-	-	-	-	-
1.0793	245	1.1148	-	-	-	-	-
1.1013	250	1.259	-	-	-	-	-
1.1233	255	0.873	-	-	-	-	-
1.1454	260	1.646	-	-	-	-	-
1.1674	265	1.7583	-	-	-	-	-
1.1894	270	1.2268	-	-	-	-	-
1.2115	275	1.3792	-	-	-	-	-
1.2335	280	2.5662	-	-	-	-	-
1.2555	285	1.5021	-	-	-	-	-
1.2775	290	1.1399	-	-	-	-	-
1.2996	295	1.3307	-	-	-	-	-
1.3216	300	0.7458	-	-	-	-	-
1.3436	305	1.1029	-	-	-	-	-
1.3656	310	1.0205	-	-	-	-	-
1.3877	315	1.0998	-	-	-	-	-
1.4097	320	0.8304	-	-	-	-	-
1.4317	325	1.3673	-	-	-	-	-
1.4537	330	2.4445	-	-	-	-	-
1.4758	335	2.8757	-	-	-	-	-
1.4978	340	1.7879	-	-	-	-	-
1.5198	345	1.1255	-	-	-	-	-
1.5419	350	1.6743	-	-	-	-	-
1.5639	355	1.3803	-	-	-	-	-
1.5859	360	1.1998	-	-	-	-	-
1.6079	365	1.2129	-	-	-	-	-
1.6300	370	1.6588	-	-	-	-	-
1.6520	375	0.9827	-	-	-	-	-
1.6740	380	0.605	-	-	-	-	-
1.6960	385	1.2934	-	-	-	-	-
1.7181	390	1.1776	-	-	-	-	-
1.7401	395	1.445	-	-	-	-	-
1.7621	400	0.6393	-	-	-	-	-
1.7841	405	0.9303	-	-	-	-	-
1.8062	410	0.7541	-	-	-	-	-
1.8282	415	0.5413	-	-	-	-	-
1.8502	420	1.5258	-	-	-	-	-
1.8722	425	1.4257	-	-	-	-	-
1.8943	430	1.3111	-	-	-	-	-
1.9163	435	1.6604	-	-	-	-	-
1.9383	440	1.4004	-	-	-	-	-
1.9604	445	2.7186	-	-	-	-	-
1.9824	450	2.2757	-	-	-	-	-
2.0	454	-	0.9401	0.9433	0.9387	0.9386	0.9416
2.0044	455	0.9345	-	-	-	-	-
2.0264	460	0.9325	-	-	-	-	-
2.0485	465	1.2434	-	-	-	-	-
2.0705	470	1.5161	-	-	-	-	-
2.0925	475	2.6011	-	-	-	-	-
2.1145	480	1.8276	-	-	-	-	-
2.1366	485	1.5005	-	-	-	-	-
2.1586	490	0.8618	-	-	-	-	-
2.1806	495	2.1422	-	-	-	-	-
2.2026	500	1.3922	-	-	-	-	-
2.2247	505	1.5939	-	-	-	-	-
2.2467	510	1.3021	-	-	-	-	-
2.2687	515	1.0825	-	-	-	-	-
2.2907	520	0.9066	-	-	-	-	-
2.3128	525	0.7717	-	-	-	-	-
2.3348	530	1.1484	-	-	-	-	-
2.3568	535	1.6513	-	-	-	-	-
2.3789	540	1.7267	-	-	-	-	-
2.4009	545	0.7659	-	-	-	-	-
2.4229	550	2.0213	-	-	-	-	-
2.4449	555	0.5329	-	-	-	-	-
2.4670	560	1.2083	-	-	-	-	-
2.4890	565	1.5432	-	-	-	-	-
2.5110	570	0.5423	-	-	-	-	-
2.5330	575	0.2613	-	-	-	-	-
2.5551	580	0.7985	-	-	-	-	-
2.5771	585	0.3003	-	-	-	-	-
2.5991	590	2.2234	-	-	-	-	-
2.6211	595	0.4772	-	-	-	-	-
2.6432	600	1.0158	-	-	-	-	-
2.6652	605	2.6385	-	-	-	-	-
2.6872	610	0.7042	-	-	-	-	-
2.7093	615	1.1469	-	-	-	-	-
2.7313	620	1.4092	-	-	-	-	-
2.7533	625	0.6487	-	-	-	-	-
2.7753	630	1.218	-	-	-	-	-
2.7974	635	1.1509	-	-	-	-	-
2.8194	640	1.1524	-	-	-	-	-
2.8414	645	0.6477	-	-	-	-	-
2.8634	650	0.6295	-	-	-	-	-
2.8855	655	1.3026	-	-	-	-	-
2.9075	660	1.9196	-	-	-	-	-
2.9295	665	1.3743	-	-	-	-	-
2.9515	670	0.8934	-	-	-	-	-
2.9736	675	1.1801	-	-	-	-	-
2.9956	680	1.2952	-	-	-	-	-
3.0	681	-	0.9538	0.9513	0.9538	0.9414	0.9435
3.0176	685	0.3324	-	-	-	-	-
3.0396	690	0.9551	-	-	-	-	-
3.0617	695	0.9315	-	-	-	-	-
3.0837	700	1.3611	-	-	-	-	-
3.1057	705	1.4406	-	-	-	-	-
3.1278	710	0.5888	-	-	-	-	-
3.1498	715	0.9149	-	-	-	-	-
3.1718	720	0.5627	-	-	-	-	-
3.1938	725	1.6876	-	-	-	-	-
3.2159	730	1.1366	-	-	-	-	-
3.2379	735	1.3571	-	-	-	-	-
3.2599	740	1.5227	-	-	-	-	-
3.2819	745	2.5139	-	-	-	-	-
3.3040	750	0.3735	-	-	-	-	-
3.3260	755	1.4386	-	-	-	-	-
3.3480	760	0.3838	-	-	-	-	-
3.3700	765	0.3973	-	-	-	-	-
3.3921	770	1.4972	-	-	-	-	-
3.4141	775	1.5118	-	-	-	-	-
3.4361	780	0.478	-	-	-	-	-
3.4581	785	1.5982	-	-	-	-	-
3.4802	790	0.6209	-	-	-	-	-
3.5022	795	0.5902	-	-	-	-	-
3.5242	800	1.0877	-	-	-	-	-
3.5463	805	0.9553	-	-	-	-	-
3.5683	810	0.3054	-	-	-	-	-
3.5903	815	1.2229	-	-	-	-	-
3.6123	820	0.7434	-	-	-	-	-
3.6344	825	1.5447	-	-	-	-	-
3.6564	830	1.0751	-	-	-	-	-
3.6784	835	0.8161	-	-	-	-	-
3.7004	840	0.4382	-	-	-	-	-
3.7225	845	1.3547	-	-	-	-	-
3.7445	850	1.7112	-	-	-	-	-
3.7665	855	0.5362	-	-	-	-	-
3.7885	860	0.9309	-	-	-	-	-
3.8106	865	1.8301	-	-	-	-	-
3.8326	870	1.5554	-	-	-	-	-
3.8546	875	1.4035	-	-	-	-	-
3.8767	880	1.5814	-	-	-	-	-
3.8987	885	0.7283	-	-	-	-	-
3.9207	890	1.8549	-	-	-	-	-
3.9427	895	0.196	-	-	-	-	-
3.9648	900	1.2072	-	-	-	-	-
3.9868	905	0.83	-	-	-	-	-
4.0	908	-	0.9564	0.9587	0.9612	0.9488	0.9563
4.0088	910	1.7222	-	-	-	-	-
4.0308	915	0.6728	-	-	-	-	-
4.0529	920	0.9388	-	-	-	-	-
4.0749	925	0.7998	-	-	-	-	-
4.0969	930	1.1561	-	-	-	-	-
4.1189	935	2.4315	-	-	-	-	-
4.1410	940	1.3263	-	-	-	-	-
4.1630	945	1.2374	-	-	-	-	-
4.1850	950	1.1307	-	-	-	-	-
4.2070	955	0.5512	-	-	-	-	-
4.2291	960	1.3266	-	-	-	-	-
4.2511	965	1.2306	-	-	-	-	-
4.2731	970	1.7083	-	-	-	-	-
4.2952	975	0.7028	-	-	-	-	-
4.3172	980	1.2987	-	-	-	-	-
4.3392	985	1.545	-	-	-	-	-
4.3612	990	1.004	-	-	-	-	-
4.3833	995	0.8276	-	-	-	-	-
4.4053	1000	1.4694	-	-	-	-	-
4.4273	1005	0.4914	-	-	-	-	-
4.4493	1010	0.9894	-	-	-	-	-
4.4714	1015	0.8855	-	-	-	-	-
4.4934	1020	1.1339	-	-	-	-	-
4.5154	1025	1.0786	-	-	-	-	-
4.5374	1030	1.2547	-	-	-	-	-
4.5595	1035	0.5312	-	-	-	-	-
4.5815	1040	1.4938	-	-	-	-	-
4.6035	1045	0.8124	-	-	-	-	-
4.6256	1050	1.2401	-	-	-	-	-
4.6476	1055	1.1902	-	-	-	-	-
4.6696	1060	1.4183	-	-	-	-	-
4.6916	1065	1.0718	-	-	-	-	-
4.7137	1070	1.2203	-	-	-	-	-
4.7357	1075	0.8535	-	-	-	-	-
4.7577	1080	1.2454	-	-	-	-	-
4.7797	1085	0.4216	-	-	-	-	-
4.8018	1090	0.8327	-	-	-	-	-
4.8238	1095	1.2371	-	-	-	-	-
4.8458	1100	1.0949	-	-	-	-	-
4.8678	1105	1.2177	-	-	-	-	-
4.8899	1110	0.6236	-	-	-	-	-
4.9119	1115	0.646	-	-	-	-	-
4.9339	1120	1.1822	-	-	-	-	-
4.9559	1125	1.0471	-	-	-	-	-
4.9780	1130	0.7626	-	-	-	-	-
5.0	1135	0.9794	0.9564	0.9563	0.9616	0.9488	0.9587

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.0.1
Transformers: 4.42.4
PyTorch: 2.3.1+cu121
Accelerate: 0.32.1
Datasets: 2.21.0
Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

joshuapb
/

fine-tuned-matryoshka

BGE base Financial Matryoshka

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Information Retrieval

Information Retrieval

Information Retrieval

Information Retrieval

Information Retrieval

Training Details

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MatryoshkaLoss

MultipleNegativesRankingLoss

Model tree for joshuapb/fine-tuned-matryoshka

Evaluation results