dpokhrel commited on
Commit
ddcb154
1 Parent(s): e3a3e72

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,817 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:6300
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: AutoZone, Inc. began operations in 1979.
35
+ sentences:
36
+ - What types of products and markets does the company cater to in the semiconductor
37
+ industry?
38
+ - When did AutoZone, Inc. begin its operations?
39
+ - How much did general and administrative expenses related to merger, acquisition,
40
+ and other costs change from 2022 to 2023?
41
+ - source_sentence: See Note 14 to the consolidated financial statements in Item 8
42
+ of this Annual regarding legal proceedings.
43
+ sentences:
44
+ - What is the source to find detailed information about legal proceedings in the
45
+ Annual Report?
46
+ - Where in the Annual Report can one find a description of certain legal matters
47
+ and their impact on the company?
48
+ - What strategic actions is Hershey taking to maintain its leadership in the U.S.
49
+ confectionery market?
50
+ - source_sentence: ICE Bonds focuses on increasing efficiency in fixed income markets
51
+ by offering electronic markets that support trading protocols including click-to-trade,
52
+ request for quotation (RFQ), and auctions.
53
+ sentences:
54
+ - What services does the ICE Bonds platform provide and what is its focus?
55
+ - What was the percentage increase in the generic dispensing rate of the Health
56
+ Services segment from 2022 to 2023?
57
+ - How many shares of Class A common stock were repurchased and retired in 2023,
58
+ and what was the total cost including excise tax accruals?
59
+ - source_sentence: Subject to various United States and foreign laws and regulations,
60
+ including those related to intellectual property, data privacy and security, cybersecurity,
61
+ tax, employment, competition and antitrust, anti-corruption, anti-bribery, and
62
+ AI. Compliance with these laws has no current material adverse impact on capital
63
+ expenditures, results of operations or competitive position.
64
+ sentences:
65
+ - How much did the total loans and lending commitments amount to as of December
66
+ 2023?
67
+ - What types of laws and regulations does the company need to comply with?
68
+ - Where are the consolidated financial statements listed in the Annual Report on
69
+ Form 10-K located?
70
+ - source_sentence: CMS made significant changes to the structure of the hierarchical
71
+ condition category model in version 28, which may impact risk adjustment factor
72
+ scores for a larger percentage of Medicare Advantage beneficiaries and could result
73
+ in changes to beneficiary RAF scores with or without a change in the patient’s
74
+ health status.
75
+ sentences:
76
+ - How does Tesla reduce costs and promote renewable power at their Supercharger
77
+ stations?
78
+ - What is the primary method by which the company manages its cash, cash equivalents,
79
+ and marketable securities?
80
+ - What significant regulatory change did CMS make to the hierarchical condition
81
+ category model in its version 28?
82
+ model-index:
83
+ - name: BGE base Financial Matryoshka
84
+ results:
85
+ - task:
86
+ type: information-retrieval
87
+ name: Information Retrieval
88
+ dataset:
89
+ name: dim 768
90
+ type: dim_768
91
+ metrics:
92
+ - type: cosine_accuracy@1
93
+ value: 0.6985714285714286
94
+ name: Cosine Accuracy@1
95
+ - type: cosine_accuracy@3
96
+ value: 0.8442857142857143
97
+ name: Cosine Accuracy@3
98
+ - type: cosine_accuracy@5
99
+ value: 0.8814285714285715
100
+ name: Cosine Accuracy@5
101
+ - type: cosine_accuracy@10
102
+ value: 0.9271428571428572
103
+ name: Cosine Accuracy@10
104
+ - type: cosine_precision@1
105
+ value: 0.6985714285714286
106
+ name: Cosine Precision@1
107
+ - type: cosine_precision@3
108
+ value: 0.2814285714285714
109
+ name: Cosine Precision@3
110
+ - type: cosine_precision@5
111
+ value: 0.17628571428571424
112
+ name: Cosine Precision@5
113
+ - type: cosine_precision@10
114
+ value: 0.09271428571428571
115
+ name: Cosine Precision@10
116
+ - type: cosine_recall@1
117
+ value: 0.6985714285714286
118
+ name: Cosine Recall@1
119
+ - type: cosine_recall@3
120
+ value: 0.8442857142857143
121
+ name: Cosine Recall@3
122
+ - type: cosine_recall@5
123
+ value: 0.8814285714285715
124
+ name: Cosine Recall@5
125
+ - type: cosine_recall@10
126
+ value: 0.9271428571428572
127
+ name: Cosine Recall@10
128
+ - type: cosine_ndcg@10
129
+ value: 0.8156553778675095
130
+ name: Cosine Ndcg@10
131
+ - type: cosine_mrr@10
132
+ value: 0.7796054421768707
133
+ name: Cosine Mrr@10
134
+ - type: cosine_map@100
135
+ value: 0.7822282461868646
136
+ name: Cosine Map@100
137
+ - task:
138
+ type: information-retrieval
139
+ name: Information Retrieval
140
+ dataset:
141
+ name: dim 512
142
+ type: dim_512
143
+ metrics:
144
+ - type: cosine_accuracy@1
145
+ value: 0.71
146
+ name: Cosine Accuracy@1
147
+ - type: cosine_accuracy@3
148
+ value: 0.8457142857142858
149
+ name: Cosine Accuracy@3
150
+ - type: cosine_accuracy@5
151
+ value: 0.8785714285714286
152
+ name: Cosine Accuracy@5
153
+ - type: cosine_accuracy@10
154
+ value: 0.9271428571428572
155
+ name: Cosine Accuracy@10
156
+ - type: cosine_precision@1
157
+ value: 0.71
158
+ name: Cosine Precision@1
159
+ - type: cosine_precision@3
160
+ value: 0.2819047619047619
161
+ name: Cosine Precision@3
162
+ - type: cosine_precision@5
163
+ value: 0.17571428571428568
164
+ name: Cosine Precision@5
165
+ - type: cosine_precision@10
166
+ value: 0.09271428571428571
167
+ name: Cosine Precision@10
168
+ - type: cosine_recall@1
169
+ value: 0.71
170
+ name: Cosine Recall@1
171
+ - type: cosine_recall@3
172
+ value: 0.8457142857142858
173
+ name: Cosine Recall@3
174
+ - type: cosine_recall@5
175
+ value: 0.8785714285714286
176
+ name: Cosine Recall@5
177
+ - type: cosine_recall@10
178
+ value: 0.9271428571428572
179
+ name: Cosine Recall@10
180
+ - type: cosine_ndcg@10
181
+ value: 0.8194766272347418
182
+ name: Cosine Ndcg@10
183
+ - type: cosine_mrr@10
184
+ value: 0.7848673469387758
185
+ name: Cosine Mrr@10
186
+ - type: cosine_map@100
187
+ value: 0.7873446316370609
188
+ name: Cosine Map@100
189
+ - task:
190
+ type: information-retrieval
191
+ name: Information Retrieval
192
+ dataset:
193
+ name: dim 256
194
+ type: dim_256
195
+ metrics:
196
+ - type: cosine_accuracy@1
197
+ value: 0.7085714285714285
198
+ name: Cosine Accuracy@1
199
+ - type: cosine_accuracy@3
200
+ value: 0.8342857142857143
201
+ name: Cosine Accuracy@3
202
+ - type: cosine_accuracy@5
203
+ value: 0.8642857142857143
204
+ name: Cosine Accuracy@5
205
+ - type: cosine_accuracy@10
206
+ value: 0.9142857142857143
207
+ name: Cosine Accuracy@10
208
+ - type: cosine_precision@1
209
+ value: 0.7085714285714285
210
+ name: Cosine Precision@1
211
+ - type: cosine_precision@3
212
+ value: 0.27809523809523806
213
+ name: Cosine Precision@3
214
+ - type: cosine_precision@5
215
+ value: 0.17285714285714282
216
+ name: Cosine Precision@5
217
+ - type: cosine_precision@10
218
+ value: 0.09142857142857141
219
+ name: Cosine Precision@10
220
+ - type: cosine_recall@1
221
+ value: 0.7085714285714285
222
+ name: Cosine Recall@1
223
+ - type: cosine_recall@3
224
+ value: 0.8342857142857143
225
+ name: Cosine Recall@3
226
+ - type: cosine_recall@5
227
+ value: 0.8642857142857143
228
+ name: Cosine Recall@5
229
+ - type: cosine_recall@10
230
+ value: 0.9142857142857143
231
+ name: Cosine Recall@10
232
+ - type: cosine_ndcg@10
233
+ value: 0.8116052646620258
234
+ name: Cosine Ndcg@10
235
+ - type: cosine_mrr@10
236
+ value: 0.77881462585034
237
+ name: Cosine Mrr@10
238
+ - type: cosine_map@100
239
+ value: 0.7821002568762089
240
+ name: Cosine Map@100
241
+ - task:
242
+ type: information-retrieval
243
+ name: Information Retrieval
244
+ dataset:
245
+ name: dim 128
246
+ type: dim_128
247
+ metrics:
248
+ - type: cosine_accuracy@1
249
+ value: 0.69
250
+ name: Cosine Accuracy@1
251
+ - type: cosine_accuracy@3
252
+ value: 0.8271428571428572
253
+ name: Cosine Accuracy@3
254
+ - type: cosine_accuracy@5
255
+ value: 0.86
256
+ name: Cosine Accuracy@5
257
+ - type: cosine_accuracy@10
258
+ value: 0.91
259
+ name: Cosine Accuracy@10
260
+ - type: cosine_precision@1
261
+ value: 0.69
262
+ name: Cosine Precision@1
263
+ - type: cosine_precision@3
264
+ value: 0.2757142857142857
265
+ name: Cosine Precision@3
266
+ - type: cosine_precision@5
267
+ value: 0.172
268
+ name: Cosine Precision@5
269
+ - type: cosine_precision@10
270
+ value: 0.09099999999999998
271
+ name: Cosine Precision@10
272
+ - type: cosine_recall@1
273
+ value: 0.69
274
+ name: Cosine Recall@1
275
+ - type: cosine_recall@3
276
+ value: 0.8271428571428572
277
+ name: Cosine Recall@3
278
+ - type: cosine_recall@5
279
+ value: 0.86
280
+ name: Cosine Recall@5
281
+ - type: cosine_recall@10
282
+ value: 0.91
283
+ name: Cosine Recall@10
284
+ - type: cosine_ndcg@10
285
+ value: 0.8013750432226047
286
+ name: Cosine Ndcg@10
287
+ - type: cosine_mrr@10
288
+ value: 0.7664954648526079
289
+ name: Cosine Mrr@10
290
+ - type: cosine_map@100
291
+ value: 0.7698726210622817
292
+ name: Cosine Map@100
293
+ - task:
294
+ type: information-retrieval
295
+ name: Information Retrieval
296
+ dataset:
297
+ name: dim 64
298
+ type: dim_64
299
+ metrics:
300
+ - type: cosine_accuracy@1
301
+ value: 0.6657142857142857
302
+ name: Cosine Accuracy@1
303
+ - type: cosine_accuracy@3
304
+ value: 0.79
305
+ name: Cosine Accuracy@3
306
+ - type: cosine_accuracy@5
307
+ value: 0.8285714285714286
308
+ name: Cosine Accuracy@5
309
+ - type: cosine_accuracy@10
310
+ value: 0.8857142857142857
311
+ name: Cosine Accuracy@10
312
+ - type: cosine_precision@1
313
+ value: 0.6657142857142857
314
+ name: Cosine Precision@1
315
+ - type: cosine_precision@3
316
+ value: 0.2633333333333333
317
+ name: Cosine Precision@3
318
+ - type: cosine_precision@5
319
+ value: 0.1657142857142857
320
+ name: Cosine Precision@5
321
+ - type: cosine_precision@10
322
+ value: 0.08857142857142855
323
+ name: Cosine Precision@10
324
+ - type: cosine_recall@1
325
+ value: 0.6657142857142857
326
+ name: Cosine Recall@1
327
+ - type: cosine_recall@3
328
+ value: 0.79
329
+ name: Cosine Recall@3
330
+ - type: cosine_recall@5
331
+ value: 0.8285714285714286
332
+ name: Cosine Recall@5
333
+ - type: cosine_recall@10
334
+ value: 0.8857142857142857
335
+ name: Cosine Recall@10
336
+ - type: cosine_ndcg@10
337
+ value: 0.7732501027431213
338
+ name: Cosine Ndcg@10
339
+ - type: cosine_mrr@10
340
+ value: 0.7375017006802721
341
+ name: Cosine Mrr@10
342
+ - type: cosine_map@100
343
+ value: 0.7416822153678694
344
+ name: Cosine Map@100
345
+ ---
346
+
347
+ # BGE base Financial Matryoshka
348
+
349
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
350
+
351
+ ## Model Details
352
+
353
+ ### Model Description
354
+ - **Model Type:** Sentence Transformer
355
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
356
+ - **Maximum Sequence Length:** 512 tokens
357
+ - **Output Dimensionality:** 768 tokens
358
+ - **Similarity Function:** Cosine Similarity
359
+ <!-- - **Training Dataset:** Unknown -->
360
+ - **Language:** en
361
+ - **License:** apache-2.0
362
+
363
+ ### Model Sources
364
+
365
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
366
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
367
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
368
+
369
+ ### Full Model Architecture
370
+
371
+ ```
372
+ SentenceTransformer(
373
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
374
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
375
+ (2): Normalize()
376
+ )
377
+ ```
378
+
379
+ ## Usage
380
+
381
+ ### Direct Usage (Sentence Transformers)
382
+
383
+ First install the Sentence Transformers library:
384
+
385
+ ```bash
386
+ pip install -U sentence-transformers
387
+ ```
388
+
389
+ Then you can load this model and run inference.
390
+ ```python
391
+ from sentence_transformers import SentenceTransformer
392
+
393
+ # Download from the 🤗 Hub
394
+ model = SentenceTransformer("dpokhrel/bge-base-financial-matryoshka")
395
+ # Run inference
396
+ sentences = [
397
+ 'CMS made significant changes to the structure of the hierarchical condition category model in version 28, which may impact risk adjustment factor scores for a larger percentage of Medicare Advantage beneficiaries and could result in changes to beneficiary RAF scores with or without a change in the patient’s health status.',
398
+ 'What significant regulatory change did CMS make to the hierarchical condition category model in its version 28?',
399
+ 'What is the primary method by which the company manages its cash, cash equivalents, and marketable securities?',
400
+ ]
401
+ embeddings = model.encode(sentences)
402
+ print(embeddings.shape)
403
+ # [3, 768]
404
+
405
+ # Get the similarity scores for the embeddings
406
+ similarities = model.similarity(embeddings, embeddings)
407
+ print(similarities.shape)
408
+ # [3, 3]
409
+ ```
410
+
411
+ <!--
412
+ ### Direct Usage (Transformers)
413
+
414
+ <details><summary>Click to see the direct usage in Transformers</summary>
415
+
416
+ </details>
417
+ -->
418
+
419
+ <!--
420
+ ### Downstream Usage (Sentence Transformers)
421
+
422
+ You can finetune this model on your own dataset.
423
+
424
+ <details><summary>Click to expand</summary>
425
+
426
+ </details>
427
+ -->
428
+
429
+ <!--
430
+ ### Out-of-Scope Use
431
+
432
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
433
+ -->
434
+
435
+ ## Evaluation
436
+
437
+ ### Metrics
438
+
439
+ #### Information Retrieval
440
+ * Dataset: `dim_768`
441
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
442
+
443
+ | Metric | Value |
444
+ |:--------------------|:-----------|
445
+ | cosine_accuracy@1 | 0.6986 |
446
+ | cosine_accuracy@3 | 0.8443 |
447
+ | cosine_accuracy@5 | 0.8814 |
448
+ | cosine_accuracy@10 | 0.9271 |
449
+ | cosine_precision@1 | 0.6986 |
450
+ | cosine_precision@3 | 0.2814 |
451
+ | cosine_precision@5 | 0.1763 |
452
+ | cosine_precision@10 | 0.0927 |
453
+ | cosine_recall@1 | 0.6986 |
454
+ | cosine_recall@3 | 0.8443 |
455
+ | cosine_recall@5 | 0.8814 |
456
+ | cosine_recall@10 | 0.9271 |
457
+ | cosine_ndcg@10 | 0.8157 |
458
+ | cosine_mrr@10 | 0.7796 |
459
+ | **cosine_map@100** | **0.7822** |
460
+
461
+ #### Information Retrieval
462
+ * Dataset: `dim_512`
463
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
464
+
465
+ | Metric | Value |
466
+ |:--------------------|:-----------|
467
+ | cosine_accuracy@1 | 0.71 |
468
+ | cosine_accuracy@3 | 0.8457 |
469
+ | cosine_accuracy@5 | 0.8786 |
470
+ | cosine_accuracy@10 | 0.9271 |
471
+ | cosine_precision@1 | 0.71 |
472
+ | cosine_precision@3 | 0.2819 |
473
+ | cosine_precision@5 | 0.1757 |
474
+ | cosine_precision@10 | 0.0927 |
475
+ | cosine_recall@1 | 0.71 |
476
+ | cosine_recall@3 | 0.8457 |
477
+ | cosine_recall@5 | 0.8786 |
478
+ | cosine_recall@10 | 0.9271 |
479
+ | cosine_ndcg@10 | 0.8195 |
480
+ | cosine_mrr@10 | 0.7849 |
481
+ | **cosine_map@100** | **0.7873** |
482
+
483
+ #### Information Retrieval
484
+ * Dataset: `dim_256`
485
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
486
+
487
+ | Metric | Value |
488
+ |:--------------------|:-----------|
489
+ | cosine_accuracy@1 | 0.7086 |
490
+ | cosine_accuracy@3 | 0.8343 |
491
+ | cosine_accuracy@5 | 0.8643 |
492
+ | cosine_accuracy@10 | 0.9143 |
493
+ | cosine_precision@1 | 0.7086 |
494
+ | cosine_precision@3 | 0.2781 |
495
+ | cosine_precision@5 | 0.1729 |
496
+ | cosine_precision@10 | 0.0914 |
497
+ | cosine_recall@1 | 0.7086 |
498
+ | cosine_recall@3 | 0.8343 |
499
+ | cosine_recall@5 | 0.8643 |
500
+ | cosine_recall@10 | 0.9143 |
501
+ | cosine_ndcg@10 | 0.8116 |
502
+ | cosine_mrr@10 | 0.7788 |
503
+ | **cosine_map@100** | **0.7821** |
504
+
505
+ #### Information Retrieval
506
+ * Dataset: `dim_128`
507
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
508
+
509
+ | Metric | Value |
510
+ |:--------------------|:-----------|
511
+ | cosine_accuracy@1 | 0.69 |
512
+ | cosine_accuracy@3 | 0.8271 |
513
+ | cosine_accuracy@5 | 0.86 |
514
+ | cosine_accuracy@10 | 0.91 |
515
+ | cosine_precision@1 | 0.69 |
516
+ | cosine_precision@3 | 0.2757 |
517
+ | cosine_precision@5 | 0.172 |
518
+ | cosine_precision@10 | 0.091 |
519
+ | cosine_recall@1 | 0.69 |
520
+ | cosine_recall@3 | 0.8271 |
521
+ | cosine_recall@5 | 0.86 |
522
+ | cosine_recall@10 | 0.91 |
523
+ | cosine_ndcg@10 | 0.8014 |
524
+ | cosine_mrr@10 | 0.7665 |
525
+ | **cosine_map@100** | **0.7699** |
526
+
527
+ #### Information Retrieval
528
+ * Dataset: `dim_64`
529
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
530
+
531
+ | Metric | Value |
532
+ |:--------------------|:-----------|
533
+ | cosine_accuracy@1 | 0.6657 |
534
+ | cosine_accuracy@3 | 0.79 |
535
+ | cosine_accuracy@5 | 0.8286 |
536
+ | cosine_accuracy@10 | 0.8857 |
537
+ | cosine_precision@1 | 0.6657 |
538
+ | cosine_precision@3 | 0.2633 |
539
+ | cosine_precision@5 | 0.1657 |
540
+ | cosine_precision@10 | 0.0886 |
541
+ | cosine_recall@1 | 0.6657 |
542
+ | cosine_recall@3 | 0.79 |
543
+ | cosine_recall@5 | 0.8286 |
544
+ | cosine_recall@10 | 0.8857 |
545
+ | cosine_ndcg@10 | 0.7733 |
546
+ | cosine_mrr@10 | 0.7375 |
547
+ | **cosine_map@100** | **0.7417** |
548
+
549
+ <!--
550
+ ## Bias, Risks and Limitations
551
+
552
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
553
+ -->
554
+
555
+ <!--
556
+ ### Recommendations
557
+
558
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
559
+ -->
560
+
561
+ ## Training Details
562
+
563
+ ### Training Dataset
564
+
565
+ #### Unnamed Dataset
566
+
567
+
568
+ * Size: 6,300 training samples
569
+ * Columns: <code>positive</code> and <code>anchor</code>
570
+ * Approximate statistics based on the first 1000 samples:
571
+ | | positive | anchor |
572
+ |:--------|:------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
573
+ | type | string | string |
574
+ | details | <ul><li>min: 10 tokens</li><li>mean: 46.37 tokens</li><li>max: 248 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 20.57 tokens</li><li>max: 51 tokens</li></ul> |
575
+ * Samples:
576
+ | positive | anchor |
577
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------|
578
+ | <code>Scenario analysis is used to quantify the impact of a specified event, including how the event impacts multiple risk factors simultaneously. For example, for sovereign stress testing, it calculates potential exposure related to sovereign positions as well as the corresponding debt, equity, and currency exposures that may be impacted by sovereign distress.</code> | <code>How does Goldman Sachs utilize scenario analysis in its risk management strategy?</code> |
579
+ | <code>The company is involved in various other legal proceedings incidental to the conduct of our business, including, but not limited to, claims and allegations related to wage and hour violations, unlawful termination, employment practices, product liability, privacy and cybersecurity, environmental matters, and intellectual property rights or regulatory compliance.</code> | <code>What types of legal proceedings is the company currently involved in?</code> |
580
+ | <code>In 2023, $505 million was utilized for common stock repurchases.</code> | <code>How much cash was utilized for common stock repurchases in the year ended December 31, 2023?</code> |
581
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
582
+ ```json
583
+ {
584
+ "loss": "MultipleNegativesRankingLoss",
585
+ "matryoshka_dims": [
586
+ 768,
587
+ 512,
588
+ 256,
589
+ 128,
590
+ 64
591
+ ],
592
+ "matryoshka_weights": [
593
+ 1,
594
+ 1,
595
+ 1,
596
+ 1,
597
+ 1
598
+ ],
599
+ "n_dims_per_step": -1
600
+ }
601
+ ```
602
+
603
+ ### Training Hyperparameters
604
+ #### Non-Default Hyperparameters
605
+
606
+ - `eval_strategy`: epoch
607
+ - `per_device_train_batch_size`: 32
608
+ - `per_device_eval_batch_size`: 16
609
+ - `gradient_accumulation_steps`: 16
610
+ - `learning_rate`: 2e-05
611
+ - `num_train_epochs`: 4
612
+ - `lr_scheduler_type`: cosine
613
+ - `warmup_ratio`: 0.1
614
+ - `bf16`: True
615
+ - `half_precision_backend`: cpu_amp
616
+ - `load_best_model_at_end`: True
617
+ - `optim`: adamw_torch_fused
618
+ - `batch_sampler`: no_duplicates
619
+
620
+ #### All Hyperparameters
621
+ <details><summary>Click to expand</summary>
622
+
623
+ - `overwrite_output_dir`: False
624
+ - `do_predict`: False
625
+ - `eval_strategy`: epoch
626
+ - `prediction_loss_only`: True
627
+ - `per_device_train_batch_size`: 32
628
+ - `per_device_eval_batch_size`: 16
629
+ - `per_gpu_train_batch_size`: None
630
+ - `per_gpu_eval_batch_size`: None
631
+ - `gradient_accumulation_steps`: 16
632
+ - `eval_accumulation_steps`: None
633
+ - `torch_empty_cache_steps`: None
634
+ - `learning_rate`: 2e-05
635
+ - `weight_decay`: 0.0
636
+ - `adam_beta1`: 0.9
637
+ - `adam_beta2`: 0.999
638
+ - `adam_epsilon`: 1e-08
639
+ - `max_grad_norm`: 1.0
640
+ - `num_train_epochs`: 4
641
+ - `max_steps`: -1
642
+ - `lr_scheduler_type`: cosine
643
+ - `lr_scheduler_kwargs`: {}
644
+ - `warmup_ratio`: 0.1
645
+ - `warmup_steps`: 0
646
+ - `log_level`: passive
647
+ - `log_level_replica`: warning
648
+ - `log_on_each_node`: True
649
+ - `logging_nan_inf_filter`: True
650
+ - `save_safetensors`: True
651
+ - `save_on_each_node`: False
652
+ - `save_only_model`: False
653
+ - `restore_callback_states_from_checkpoint`: False
654
+ - `no_cuda`: False
655
+ - `use_cpu`: False
656
+ - `use_mps_device`: False
657
+ - `seed`: 42
658
+ - `data_seed`: None
659
+ - `jit_mode_eval`: False
660
+ - `use_ipex`: False
661
+ - `bf16`: True
662
+ - `fp16`: False
663
+ - `fp16_opt_level`: O1
664
+ - `half_precision_backend`: cpu_amp
665
+ - `bf16_full_eval`: False
666
+ - `fp16_full_eval`: False
667
+ - `tf32`: None
668
+ - `local_rank`: 0
669
+ - `ddp_backend`: None
670
+ - `tpu_num_cores`: None
671
+ - `tpu_metrics_debug`: False
672
+ - `debug`: []
673
+ - `dataloader_drop_last`: False
674
+ - `dataloader_num_workers`: 0
675
+ - `dataloader_prefetch_factor`: None
676
+ - `past_index`: -1
677
+ - `disable_tqdm`: False
678
+ - `remove_unused_columns`: True
679
+ - `label_names`: None
680
+ - `load_best_model_at_end`: True
681
+ - `ignore_data_skip`: False
682
+ - `fsdp`: []
683
+ - `fsdp_min_num_params`: 0
684
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
685
+ - `fsdp_transformer_layer_cls_to_wrap`: None
686
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
687
+ - `deepspeed`: None
688
+ - `label_smoothing_factor`: 0.0
689
+ - `optim`: adamw_torch_fused
690
+ - `optim_args`: None
691
+ - `adafactor`: False
692
+ - `group_by_length`: False
693
+ - `length_column_name`: length
694
+ - `ddp_find_unused_parameters`: None
695
+ - `ddp_bucket_cap_mb`: None
696
+ - `ddp_broadcast_buffers`: False
697
+ - `dataloader_pin_memory`: True
698
+ - `dataloader_persistent_workers`: False
699
+ - `skip_memory_metrics`: True
700
+ - `use_legacy_prediction_loop`: False
701
+ - `push_to_hub`: False
702
+ - `resume_from_checkpoint`: None
703
+ - `hub_model_id`: None
704
+ - `hub_strategy`: every_save
705
+ - `hub_private_repo`: False
706
+ - `hub_always_push`: False
707
+ - `gradient_checkpointing`: False
708
+ - `gradient_checkpointing_kwargs`: None
709
+ - `include_inputs_for_metrics`: False
710
+ - `eval_do_concat_batches`: True
711
+ - `fp16_backend`: auto
712
+ - `push_to_hub_model_id`: None
713
+ - `push_to_hub_organization`: None
714
+ - `mp_parameters`:
715
+ - `auto_find_batch_size`: False
716
+ - `full_determinism`: False
717
+ - `torchdynamo`: None
718
+ - `ray_scope`: last
719
+ - `ddp_timeout`: 1800
720
+ - `torch_compile`: False
721
+ - `torch_compile_backend`: None
722
+ - `torch_compile_mode`: None
723
+ - `dispatch_batches`: None
724
+ - `split_batches`: None
725
+ - `include_tokens_per_second`: False
726
+ - `include_num_input_tokens_seen`: False
727
+ - `neftune_noise_alpha`: None
728
+ - `optim_target_modules`: None
729
+ - `batch_eval_metrics`: False
730
+ - `eval_on_start`: False
731
+ - `eval_use_gather_object`: False
732
+ - `batch_sampler`: no_duplicates
733
+ - `multi_dataset_batch_sampler`: proportional
734
+
735
+ </details>
736
+
737
+ ### Training Logs
738
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
739
+ |:----------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
740
+ | 0.8122 | 10 | 1.5241 | - | - | - | - | - |
741
+ | 0.9746 | 12 | - | 0.7486 | 0.7656 | 0.7662 | 0.7108 | 0.7679 |
742
+ | 1.6244 | 20 | 0.658 | - | - | - | - | - |
743
+ | 1.9492 | 24 | - | 0.7656 | 0.7793 | 0.7843 | 0.7348 | 0.7798 |
744
+ | 2.4365 | 30 | 0.4743 | - | - | - | - | - |
745
+ | 2.9239 | 36 | - | 0.7683 | 0.7814 | 0.7859 | 0.7400 | 0.7812 |
746
+ | 3.2487 | 40 | 0.4241 | - | - | - | - | - |
747
+ | **3.8985** | **48** | **-** | **0.7699** | **0.7821** | **0.7873** | **0.7417** | **0.7822** |
748
+
749
+ * The bold row denotes the saved checkpoint.
750
+
751
+ ### Framework Versions
752
+ - Python: 3.11.5
753
+ - Sentence Transformers: 3.0.1
754
+ - Transformers: 4.43.4
755
+ - PyTorch: 2.4.0.dev20240607+cu118
756
+ - Accelerate: 0.32.0
757
+ - Datasets: 2.20.0
758
+ - Tokenizers: 0.19.1
759
+
760
+ ## Citation
761
+
762
+ ### BibTeX
763
+
764
+ #### Sentence Transformers
765
+ ```bibtex
766
+ @inproceedings{reimers-2019-sentence-bert,
767
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
768
+ author = "Reimers, Nils and Gurevych, Iryna",
769
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
770
+ month = "11",
771
+ year = "2019",
772
+ publisher = "Association for Computational Linguistics",
773
+ url = "https://arxiv.org/abs/1908.10084",
774
+ }
775
+ ```
776
+
777
+ #### MatryoshkaLoss
778
+ ```bibtex
779
+ @misc{kusupati2024matryoshka,
780
+ title={Matryoshka Representation Learning},
781
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
782
+ year={2024},
783
+ eprint={2205.13147},
784
+ archivePrefix={arXiv},
785
+ primaryClass={cs.LG}
786
+ }
787
+ ```
788
+
789
+ #### MultipleNegativesRankingLoss
790
+ ```bibtex
791
+ @misc{henderson2017efficient,
792
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
793
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
794
+ year={2017},
795
+ eprint={1705.00652},
796
+ archivePrefix={arXiv},
797
+ primaryClass={cs.CL}
798
+ }
799
+ ```
800
+
801
+ <!--
802
+ ## Glossary
803
+
804
+ *Clearly define terms in order to be accessible across audiences.*
805
+ -->
806
+
807
+ <!--
808
+ ## Model Card Authors
809
+
810
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
811
+ -->
812
+
813
+ <!--
814
+ ## Model Card Contact
815
+
816
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
817
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "BAAI/bge-base-en-v1.5",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.43.4",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.43.4",
5
+ "pytorch": "2.4.0.dev20240607+cu118"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:076041a1f6d0caead36542ba7d54a10391e27097e08b4fd9eff10f127a848618
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "never_split": null,
51
+ "pad_token": "[PAD]",
52
+ "sep_token": "[SEP]",
53
+ "strip_accents": null,
54
+ "tokenize_chinese_chars": true,
55
+ "tokenizer_class": "BertTokenizer",
56
+ "unk_token": "[UNK]"
57
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff