joshuapb commited on
Commit
4e70f9f
1 Parent(s): 889debc

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,864 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: BAAI/bge-base-en-v1.5
3
+ datasets: []
4
+ language:
5
+ - en
6
+ library_name: sentence-transformers
7
+ license: apache-2.0
8
+ metrics:
9
+ - cosine_accuracy@1
10
+ - cosine_accuracy@3
11
+ - cosine_accuracy@5
12
+ - cosine_accuracy@10
13
+ - cosine_precision@1
14
+ - cosine_precision@3
15
+ - cosine_precision@5
16
+ - cosine_precision@10
17
+ - cosine_recall@1
18
+ - cosine_recall@3
19
+ - cosine_recall@5
20
+ - cosine_recall@10
21
+ - cosine_ndcg@10
22
+ - cosine_mrr@10
23
+ - cosine_map@100
24
+ pipeline_tag: sentence-similarity
25
+ tags:
26
+ - sentence-transformers
27
+ - sentence-similarity
28
+ - feature-extraction
29
+ - generated_from_trainer
30
+ - dataset_size:200
31
+ - loss:MatryoshkaLoss
32
+ - loss:MultipleNegativesRankingLoss
33
+ widget:
34
+ - source_sentence: 'Verbalized number or word (e.g. “lowest”, “low”, “medium”, “high”,
35
+ “highest”), such as "Confidence: 60% / Medium".
36
+
37
+ Normalized logprob of answer tokens; Note that this one is not used in the fine-tuning
38
+ experiment.
39
+
40
+ Logprob of an indirect "True/False" token after the raw answer.
41
+
42
+ Their experiments focused on how well calibration generalizes under distribution
43
+ shifts in task difficulty or content. Each fine-tuning datapoint is a question,
44
+ the model’s answer (possibly incorrect), and a calibrated confidence. Verbalized
45
+ probability generalizes well to both cases, while all setups are doing well on
46
+ multiply-divide task shift. Few-shot is weaker than fine-tuned models on how
47
+ well the confidence is predicted by the model. It is helpful to include more examples
48
+ and 50-shot is almost as good as a fine-tuned version.'
49
+ sentences:
50
+ - What is the relationship between the calibration of AI models and the effectiveness
51
+ of verbalized probabilities when applied to tasks of varying difficulty levels?
52
+ - In what ways does the F1 @ K metric contribute to evaluating the factual accuracy
53
+ and comprehensiveness of outputs generated by long-form language models?
54
+ - What impact does the implementation of a pretrained query-document relevance model
55
+ have on the process of document selection in research methodologies?
56
+ - source_sentence: 'Fig. 4. Overview of SAFE for factuality evaluation of long-form
57
+ LLM generation. (Image source: Wei et al. 2024)
58
+
59
+ The SAFE evaluation metric is F1 @ K. The motivation is that model response for
60
+ long-form factuality should ideally hit both precision and recall, as the response
61
+ should be both
62
+
63
+
64
+ factual : measured by precision, the percentage of supported facts among all facts
65
+ in the entire response.
66
+
67
+ long : measured by recall, the percentage of provided facts among all relevant
68
+ facts that should appear in the response. Therefore we want to consider the number
69
+ of supported facts up to $K$.
70
+
71
+
72
+ Given the model response $y$, the metric F1 @ K is defined as:'
73
+ sentences:
74
+ - What methodologies does the agreement model employ to identify discrepancies between
75
+ the original and revised text, and how do these methodologies impact the overall
76
+ editing workflow?
77
+ - In what ways does the SAFE evaluation metric achieve a harmonious equilibrium
78
+ between precision and recall in the context of evaluating the factual accuracy
79
+ of long-form outputs generated by large language models?
80
+ - In what ways does the inherently adversarial structure of TruthfulQA inquiries
81
+ facilitate the detection of prevalent fallacies in human cognitive processes,
82
+ and what implications does this have for understanding the constraints of expansive
83
+ language models?
84
+ - source_sentence: 'Non-context LLM: Prompt LLM directly with <atomic-fact> True or
85
+ False? without additional context.
86
+
87
+ Retrieval→LLM: Prompt with $k$ related passages retrieved from the knowledge source
88
+ as context.
89
+
90
+ Nonparametric probability (NP)): Compute the average likelihood of tokens in the
91
+ atomic fact by a masked LM and use that to make a prediction.
92
+
93
+ Retrieval→LLM + NP: Ensemble of two methods.
94
+
95
+
96
+ Some interesting observations on model hallucination behavior:
97
+
98
+
99
+ Error rates are higher for rarer entities in the task of biography generation.
100
+
101
+ Error rates are higher for facts mentioned later in the generation.
102
+
103
+ Using retrieval to ground the model generation significantly helps reduce hallucination.'
104
+ sentences:
105
+ - In what ways does the Rethinking with Retrieval (RR) methodology leverage Chain-of-Thought
106
+ (CoT) prompting to enhance the efficacy of external knowledge retrieval, and what
107
+ implications does this have for the precision of predictive outcomes generated
108
+ by models?
109
+ - In what ways does the retrieval of related passages contribute to minimizing hallucinations
110
+ in large language models, and what specific techniques can be employed to evaluate
111
+ the impact of this approach?
112
+ - What are the benefits of using retrieval methods in biography generation to minimize
113
+ inaccuracies, especially when compared to traditional prompting techniques that
114
+ lack context?
115
+ - source_sentence: 'Yin et al. (2023) studies the concept of self-knowledge, referring
116
+ to whether language models know what they know or don’t know.
117
+
118
+ SelfAware, containing 1,032 unanswerable questions across five categories and
119
+ 2,337 answerable questions. Unanswerable questions are sourced from online forums
120
+ with human annotations while answerable questions are sourced from SQuAD, HotpotQA
121
+ and TriviaQA based on text similarity with unanswerable questions. A question
122
+ may be unanswerable due to various reasons, such as no scientific consensus, imaginations
123
+ of the future, completely subjective, philosophical reasons that may yield multiple
124
+ responses, etc. Considering separating answerable vs unanswerable questions as
125
+ a binary classification task, we can measure F1-score or accuracy and the experiments
126
+ showed that larger models can do better at this task.'
127
+ sentences:
128
+ - What is the relationship between model size and performance metrics, such as F1-score
129
+ and accuracy, in the context of classifying questions into answerable and unanswerable
130
+ categories?
131
+ - How does the introduction of stochastic perturbations in synthetic training data
132
+ contribute to the enhancement of editor model efficacy within LangChain frameworks?
133
+ - How do the various output values linked to reflection tokens in the Self-RAG framework
134
+ impact the generation process, and why are they important?
135
+ - source_sentence: 'Fig. 1. Knowledge categorization of close-book QA examples based
136
+ on how likely the model outputs correct answers. (Image source: Gekhman et al.
137
+ 2024)
138
+
139
+ Some interesting observations of the experiments, where dev set accuracy is considered
140
+ a proxy for hallucinations.
141
+
142
+
143
+ Unknown examples are fitted substantially slower than Known.
144
+
145
+ The best dev performance is obtained when the LLM fits the majority of the Known
146
+ training examples but only a few of the Unknown ones. The model starts to hallucinate
147
+ when it learns most of the Unknown examples.
148
+
149
+ Among Known examples, MaybeKnown cases result in better overall performance, more
150
+ essential than HighlyKnown ones.'
151
+ sentences:
152
+ - In what ways does the fitting speed of examples that are not previously encountered
153
+ differ from that of familiar examples, and how does this variation influence the
154
+ overall accuracy of the model on the development set?
155
+ - What role do reflection tokens play in enhancing the efficiency of document retrieval
156
+ and generation within the Self-RAG framework?
157
+ - How do the results presented by Gekhman et al. in their 2024 study inform our
158
+ understanding of the reliability metrics associated with large language models
159
+ (LLMs) when subjected to fine-tuning with novel datasets?
160
+ model-index:
161
+ - name: BGE base Financial Matryoshka
162
+ results:
163
+ - task:
164
+ type: information-retrieval
165
+ name: Information Retrieval
166
+ dataset:
167
+ name: dim 768
168
+ type: dim_768
169
+ metrics:
170
+ - type: cosine_accuracy@1
171
+ value: 0.8802083333333334
172
+ name: Cosine Accuracy@1
173
+ - type: cosine_accuracy@3
174
+ value: 0.96875
175
+ name: Cosine Accuracy@3
176
+ - type: cosine_accuracy@5
177
+ value: 0.984375
178
+ name: Cosine Accuracy@5
179
+ - type: cosine_accuracy@10
180
+ value: 0.9947916666666666
181
+ name: Cosine Accuracy@10
182
+ - type: cosine_precision@1
183
+ value: 0.8802083333333334
184
+ name: Cosine Precision@1
185
+ - type: cosine_precision@3
186
+ value: 0.3229166666666667
187
+ name: Cosine Precision@3
188
+ - type: cosine_precision@5
189
+ value: 0.196875
190
+ name: Cosine Precision@5
191
+ - type: cosine_precision@10
192
+ value: 0.09947916666666667
193
+ name: Cosine Precision@10
194
+ - type: cosine_recall@1
195
+ value: 0.8802083333333334
196
+ name: Cosine Recall@1
197
+ - type: cosine_recall@3
198
+ value: 0.96875
199
+ name: Cosine Recall@3
200
+ - type: cosine_recall@5
201
+ value: 0.984375
202
+ name: Cosine Recall@5
203
+ - type: cosine_recall@10
204
+ value: 0.9947916666666666
205
+ name: Cosine Recall@10
206
+ - type: cosine_ndcg@10
207
+ value: 0.9433275174124347
208
+ name: Cosine Ndcg@10
209
+ - type: cosine_mrr@10
210
+ value: 0.9261284722222224
211
+ name: Cosine Mrr@10
212
+ - type: cosine_map@100
213
+ value: 0.9264025950292397
214
+ name: Cosine Map@100
215
+ - task:
216
+ type: information-retrieval
217
+ name: Information Retrieval
218
+ dataset:
219
+ name: dim 512
220
+ type: dim_512
221
+ metrics:
222
+ - type: cosine_accuracy@1
223
+ value: 0.8697916666666666
224
+ name: Cosine Accuracy@1
225
+ - type: cosine_accuracy@3
226
+ value: 0.9739583333333334
227
+ name: Cosine Accuracy@3
228
+ - type: cosine_accuracy@5
229
+ value: 0.9739583333333334
230
+ name: Cosine Accuracy@5
231
+ - type: cosine_accuracy@10
232
+ value: 0.9947916666666666
233
+ name: Cosine Accuracy@10
234
+ - type: cosine_precision@1
235
+ value: 0.8697916666666666
236
+ name: Cosine Precision@1
237
+ - type: cosine_precision@3
238
+ value: 0.3246527777777778
239
+ name: Cosine Precision@3
240
+ - type: cosine_precision@5
241
+ value: 0.1947916666666666
242
+ name: Cosine Precision@5
243
+ - type: cosine_precision@10
244
+ value: 0.09947916666666667
245
+ name: Cosine Precision@10
246
+ - type: cosine_recall@1
247
+ value: 0.8697916666666666
248
+ name: Cosine Recall@1
249
+ - type: cosine_recall@3
250
+ value: 0.9739583333333334
251
+ name: Cosine Recall@3
252
+ - type: cosine_recall@5
253
+ value: 0.9739583333333334
254
+ name: Cosine Recall@5
255
+ - type: cosine_recall@10
256
+ value: 0.9947916666666666
257
+ name: Cosine Recall@10
258
+ - type: cosine_ndcg@10
259
+ value: 0.939968526552219
260
+ name: Cosine Ndcg@10
261
+ - type: cosine_mrr@10
262
+ value: 0.9216269841269841
263
+ name: Cosine Mrr@10
264
+ - type: cosine_map@100
265
+ value: 0.9220610119047619
266
+ name: Cosine Map@100
267
+ - task:
268
+ type: information-retrieval
269
+ name: Information Retrieval
270
+ dataset:
271
+ name: dim 256
272
+ type: dim_256
273
+ metrics:
274
+ - type: cosine_accuracy@1
275
+ value: 0.8697916666666666
276
+ name: Cosine Accuracy@1
277
+ - type: cosine_accuracy@3
278
+ value: 0.9739583333333334
279
+ name: Cosine Accuracy@3
280
+ - type: cosine_accuracy@5
281
+ value: 0.984375
282
+ name: Cosine Accuracy@5
283
+ - type: cosine_accuracy@10
284
+ value: 1.0
285
+ name: Cosine Accuracy@10
286
+ - type: cosine_precision@1
287
+ value: 0.8697916666666666
288
+ name: Cosine Precision@1
289
+ - type: cosine_precision@3
290
+ value: 0.3246527777777778
291
+ name: Cosine Precision@3
292
+ - type: cosine_precision@5
293
+ value: 0.196875
294
+ name: Cosine Precision@5
295
+ - type: cosine_precision@10
296
+ value: 0.09999999999999999
297
+ name: Cosine Precision@10
298
+ - type: cosine_recall@1
299
+ value: 0.8697916666666666
300
+ name: Cosine Recall@1
301
+ - type: cosine_recall@3
302
+ value: 0.9739583333333334
303
+ name: Cosine Recall@3
304
+ - type: cosine_recall@5
305
+ value: 0.984375
306
+ name: Cosine Recall@5
307
+ - type: cosine_recall@10
308
+ value: 1.0
309
+ name: Cosine Recall@10
310
+ - type: cosine_ndcg@10
311
+ value: 0.9419747509776967
312
+ name: Cosine Ndcg@10
313
+ - type: cosine_mrr@10
314
+ value: 0.922676917989418
315
+ name: Cosine Mrr@10
316
+ - type: cosine_map@100
317
+ value: 0.922676917989418
318
+ name: Cosine Map@100
319
+ - task:
320
+ type: information-retrieval
321
+ name: Information Retrieval
322
+ dataset:
323
+ name: dim 128
324
+ type: dim_128
325
+ metrics:
326
+ - type: cosine_accuracy@1
327
+ value: 0.8541666666666666
328
+ name: Cosine Accuracy@1
329
+ - type: cosine_accuracy@3
330
+ value: 0.9583333333333334
331
+ name: Cosine Accuracy@3
332
+ - type: cosine_accuracy@5
333
+ value: 0.96875
334
+ name: Cosine Accuracy@5
335
+ - type: cosine_accuracy@10
336
+ value: 0.9947916666666666
337
+ name: Cosine Accuracy@10
338
+ - type: cosine_precision@1
339
+ value: 0.8541666666666666
340
+ name: Cosine Precision@1
341
+ - type: cosine_precision@3
342
+ value: 0.3194444444444445
343
+ name: Cosine Precision@3
344
+ - type: cosine_precision@5
345
+ value: 0.19374999999999998
346
+ name: Cosine Precision@5
347
+ - type: cosine_precision@10
348
+ value: 0.09947916666666667
349
+ name: Cosine Precision@10
350
+ - type: cosine_recall@1
351
+ value: 0.8541666666666666
352
+ name: Cosine Recall@1
353
+ - type: cosine_recall@3
354
+ value: 0.9583333333333334
355
+ name: Cosine Recall@3
356
+ - type: cosine_recall@5
357
+ value: 0.96875
358
+ name: Cosine Recall@5
359
+ - type: cosine_recall@10
360
+ value: 0.9947916666666666
361
+ name: Cosine Recall@10
362
+ - type: cosine_ndcg@10
363
+ value: 0.9306358745697197
364
+ name: Cosine Ndcg@10
365
+ - type: cosine_mrr@10
366
+ value: 0.9094328703703702
367
+ name: Cosine Mrr@10
368
+ - type: cosine_map@100
369
+ value: 0.9098668981481483
370
+ name: Cosine Map@100
371
+ - task:
372
+ type: information-retrieval
373
+ name: Information Retrieval
374
+ dataset:
375
+ name: dim 64
376
+ type: dim_64
377
+ metrics:
378
+ - type: cosine_accuracy@1
379
+ value: 0.7916666666666666
380
+ name: Cosine Accuracy@1
381
+ - type: cosine_accuracy@3
382
+ value: 0.953125
383
+ name: Cosine Accuracy@3
384
+ - type: cosine_accuracy@5
385
+ value: 0.9739583333333334
386
+ name: Cosine Accuracy@5
387
+ - type: cosine_accuracy@10
388
+ value: 0.9895833333333334
389
+ name: Cosine Accuracy@10
390
+ - type: cosine_precision@1
391
+ value: 0.7916666666666666
392
+ name: Cosine Precision@1
393
+ - type: cosine_precision@3
394
+ value: 0.3177083333333333
395
+ name: Cosine Precision@3
396
+ - type: cosine_precision@5
397
+ value: 0.1947916666666666
398
+ name: Cosine Precision@5
399
+ - type: cosine_precision@10
400
+ value: 0.09895833333333333
401
+ name: Cosine Precision@10
402
+ - type: cosine_recall@1
403
+ value: 0.7916666666666666
404
+ name: Cosine Recall@1
405
+ - type: cosine_recall@3
406
+ value: 0.953125
407
+ name: Cosine Recall@3
408
+ - type: cosine_recall@5
409
+ value: 0.9739583333333334
410
+ name: Cosine Recall@5
411
+ - type: cosine_recall@10
412
+ value: 0.9895833333333334
413
+ name: Cosine Recall@10
414
+ - type: cosine_ndcg@10
415
+ value: 0.9003914274568845
416
+ name: Cosine Ndcg@10
417
+ - type: cosine_mrr@10
418
+ value: 0.8705935846560847
419
+ name: Cosine Mrr@10
420
+ - type: cosine_map@100
421
+ value: 0.8713150853775854
422
+ name: Cosine Map@100
423
+ ---
424
+
425
+ # BGE base Financial Matryoshka
426
+
427
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
428
+
429
+ ## Model Details
430
+
431
+ ### Model Description
432
+ - **Model Type:** Sentence Transformer
433
+ - **Base model:** [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) <!-- at revision a5beb1e3e68b9ab74eb54cfd186867f64f240e1a -->
434
+ - **Maximum Sequence Length:** 512 tokens
435
+ - **Output Dimensionality:** 768 tokens
436
+ - **Similarity Function:** Cosine Similarity
437
+ <!-- - **Training Dataset:** Unknown -->
438
+ - **Language:** en
439
+ - **License:** apache-2.0
440
+
441
+ ### Model Sources
442
+
443
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
444
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
445
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
446
+
447
+ ### Full Model Architecture
448
+
449
+ ```
450
+ SentenceTransformer(
451
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
452
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
453
+ (2): Normalize()
454
+ )
455
+ ```
456
+
457
+ ## Usage
458
+
459
+ ### Direct Usage (Sentence Transformers)
460
+
461
+ First install the Sentence Transformers library:
462
+
463
+ ```bash
464
+ pip install -U sentence-transformers
465
+ ```
466
+
467
+ Then you can load this model and run inference.
468
+ ```python
469
+ from sentence_transformers import SentenceTransformer
470
+
471
+ # Download from the 🤗 Hub
472
+ model = SentenceTransformer("joshuapb/fine-tuned-matryoshka-200")
473
+ # Run inference
474
+ sentences = [
475
+ 'Fig. 1. Knowledge categorization of close-book QA examples based on how likely the model outputs correct answers. (Image source: Gekhman et al. 2024)\nSome interesting observations of the experiments, where dev set accuracy is considered a proxy for hallucinations.\n\nUnknown examples are fitted substantially slower than Known.\nThe best dev performance is obtained when the LLM fits the majority of the Known training examples but only a few of the Unknown ones. The model starts to hallucinate when it learns most of the Unknown examples.\nAmong Known examples, MaybeKnown cases result in better overall performance, more essential than HighlyKnown ones.',
476
+ 'In what ways does the fitting speed of examples that are not previously encountered differ from that of familiar examples, and how does this variation influence the overall accuracy of the model on the development set?',
477
+ 'How do the results presented by Gekhman et al. in their 2024 study inform our understanding of the reliability metrics associated with large language models (LLMs) when subjected to fine-tuning with novel datasets?',
478
+ ]
479
+ embeddings = model.encode(sentences)
480
+ print(embeddings.shape)
481
+ # [3, 768]
482
+
483
+ # Get the similarity scores for the embeddings
484
+ similarities = model.similarity(embeddings, embeddings)
485
+ print(similarities.shape)
486
+ # [3, 3]
487
+ ```
488
+
489
+ <!--
490
+ ### Direct Usage (Transformers)
491
+
492
+ <details><summary>Click to see the direct usage in Transformers</summary>
493
+
494
+ </details>
495
+ -->
496
+
497
+ <!--
498
+ ### Downstream Usage (Sentence Transformers)
499
+
500
+ You can finetune this model on your own dataset.
501
+
502
+ <details><summary>Click to expand</summary>
503
+
504
+ </details>
505
+ -->
506
+
507
+ <!--
508
+ ### Out-of-Scope Use
509
+
510
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
511
+ -->
512
+
513
+ ## Evaluation
514
+
515
+ ### Metrics
516
+
517
+ #### Information Retrieval
518
+ * Dataset: `dim_768`
519
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
520
+
521
+ | Metric | Value |
522
+ |:--------------------|:-----------|
523
+ | cosine_accuracy@1 | 0.8802 |
524
+ | cosine_accuracy@3 | 0.9688 |
525
+ | cosine_accuracy@5 | 0.9844 |
526
+ | cosine_accuracy@10 | 0.9948 |
527
+ | cosine_precision@1 | 0.8802 |
528
+ | cosine_precision@3 | 0.3229 |
529
+ | cosine_precision@5 | 0.1969 |
530
+ | cosine_precision@10 | 0.0995 |
531
+ | cosine_recall@1 | 0.8802 |
532
+ | cosine_recall@3 | 0.9688 |
533
+ | cosine_recall@5 | 0.9844 |
534
+ | cosine_recall@10 | 0.9948 |
535
+ | cosine_ndcg@10 | 0.9433 |
536
+ | cosine_mrr@10 | 0.9261 |
537
+ | **cosine_map@100** | **0.9264** |
538
+
539
+ #### Information Retrieval
540
+ * Dataset: `dim_512`
541
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
542
+
543
+ | Metric | Value |
544
+ |:--------------------|:-----------|
545
+ | cosine_accuracy@1 | 0.8698 |
546
+ | cosine_accuracy@3 | 0.974 |
547
+ | cosine_accuracy@5 | 0.974 |
548
+ | cosine_accuracy@10 | 0.9948 |
549
+ | cosine_precision@1 | 0.8698 |
550
+ | cosine_precision@3 | 0.3247 |
551
+ | cosine_precision@5 | 0.1948 |
552
+ | cosine_precision@10 | 0.0995 |
553
+ | cosine_recall@1 | 0.8698 |
554
+ | cosine_recall@3 | 0.974 |
555
+ | cosine_recall@5 | 0.974 |
556
+ | cosine_recall@10 | 0.9948 |
557
+ | cosine_ndcg@10 | 0.94 |
558
+ | cosine_mrr@10 | 0.9216 |
559
+ | **cosine_map@100** | **0.9221** |
560
+
561
+ #### Information Retrieval
562
+ * Dataset: `dim_256`
563
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
564
+
565
+ | Metric | Value |
566
+ |:--------------------|:-----------|
567
+ | cosine_accuracy@1 | 0.8698 |
568
+ | cosine_accuracy@3 | 0.974 |
569
+ | cosine_accuracy@5 | 0.9844 |
570
+ | cosine_accuracy@10 | 1.0 |
571
+ | cosine_precision@1 | 0.8698 |
572
+ | cosine_precision@3 | 0.3247 |
573
+ | cosine_precision@5 | 0.1969 |
574
+ | cosine_precision@10 | 0.1 |
575
+ | cosine_recall@1 | 0.8698 |
576
+ | cosine_recall@3 | 0.974 |
577
+ | cosine_recall@5 | 0.9844 |
578
+ | cosine_recall@10 | 1.0 |
579
+ | cosine_ndcg@10 | 0.942 |
580
+ | cosine_mrr@10 | 0.9227 |
581
+ | **cosine_map@100** | **0.9227** |
582
+
583
+ #### Information Retrieval
584
+ * Dataset: `dim_128`
585
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
586
+
587
+ | Metric | Value |
588
+ |:--------------------|:-----------|
589
+ | cosine_accuracy@1 | 0.8542 |
590
+ | cosine_accuracy@3 | 0.9583 |
591
+ | cosine_accuracy@5 | 0.9688 |
592
+ | cosine_accuracy@10 | 0.9948 |
593
+ | cosine_precision@1 | 0.8542 |
594
+ | cosine_precision@3 | 0.3194 |
595
+ | cosine_precision@5 | 0.1937 |
596
+ | cosine_precision@10 | 0.0995 |
597
+ | cosine_recall@1 | 0.8542 |
598
+ | cosine_recall@3 | 0.9583 |
599
+ | cosine_recall@5 | 0.9688 |
600
+ | cosine_recall@10 | 0.9948 |
601
+ | cosine_ndcg@10 | 0.9306 |
602
+ | cosine_mrr@10 | 0.9094 |
603
+ | **cosine_map@100** | **0.9099** |
604
+
605
+ #### Information Retrieval
606
+ * Dataset: `dim_64`
607
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
608
+
609
+ | Metric | Value |
610
+ |:--------------------|:-----------|
611
+ | cosine_accuracy@1 | 0.7917 |
612
+ | cosine_accuracy@3 | 0.9531 |
613
+ | cosine_accuracy@5 | 0.974 |
614
+ | cosine_accuracy@10 | 0.9896 |
615
+ | cosine_precision@1 | 0.7917 |
616
+ | cosine_precision@3 | 0.3177 |
617
+ | cosine_precision@5 | 0.1948 |
618
+ | cosine_precision@10 | 0.099 |
619
+ | cosine_recall@1 | 0.7917 |
620
+ | cosine_recall@3 | 0.9531 |
621
+ | cosine_recall@5 | 0.974 |
622
+ | cosine_recall@10 | 0.9896 |
623
+ | cosine_ndcg@10 | 0.9004 |
624
+ | cosine_mrr@10 | 0.8706 |
625
+ | **cosine_map@100** | **0.8713** |
626
+
627
+ <!--
628
+ ## Bias, Risks and Limitations
629
+
630
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
631
+ -->
632
+
633
+ <!--
634
+ ### Recommendations
635
+
636
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
637
+ -->
638
+
639
+ ## Training Details
640
+
641
+ ### Training Hyperparameters
642
+ #### Non-Default Hyperparameters
643
+
644
+ - `eval_strategy`: epoch
645
+ - `per_device_eval_batch_size`: 16
646
+ - `learning_rate`: 2e-05
647
+ - `num_train_epochs`: 5
648
+ - `lr_scheduler_type`: cosine
649
+ - `warmup_ratio`: 0.1
650
+ - `load_best_model_at_end`: True
651
+
652
+ #### All Hyperparameters
653
+ <details><summary>Click to expand</summary>
654
+
655
+ - `overwrite_output_dir`: False
656
+ - `do_predict`: False
657
+ - `eval_strategy`: epoch
658
+ - `prediction_loss_only`: True
659
+ - `per_device_train_batch_size`: 8
660
+ - `per_device_eval_batch_size`: 16
661
+ - `per_gpu_train_batch_size`: None
662
+ - `per_gpu_eval_batch_size`: None
663
+ - `gradient_accumulation_steps`: 1
664
+ - `eval_accumulation_steps`: None
665
+ - `learning_rate`: 2e-05
666
+ - `weight_decay`: 0.0
667
+ - `adam_beta1`: 0.9
668
+ - `adam_beta2`: 0.999
669
+ - `adam_epsilon`: 1e-08
670
+ - `max_grad_norm`: 1.0
671
+ - `num_train_epochs`: 5
672
+ - `max_steps`: -1
673
+ - `lr_scheduler_type`: cosine
674
+ - `lr_scheduler_kwargs`: {}
675
+ - `warmup_ratio`: 0.1
676
+ - `warmup_steps`: 0
677
+ - `log_level`: passive
678
+ - `log_level_replica`: warning
679
+ - `log_on_each_node`: True
680
+ - `logging_nan_inf_filter`: True
681
+ - `save_safetensors`: True
682
+ - `save_on_each_node`: False
683
+ - `save_only_model`: False
684
+ - `restore_callback_states_from_checkpoint`: False
685
+ - `no_cuda`: False
686
+ - `use_cpu`: False
687
+ - `use_mps_device`: False
688
+ - `seed`: 42
689
+ - `data_seed`: None
690
+ - `jit_mode_eval`: False
691
+ - `use_ipex`: False
692
+ - `bf16`: False
693
+ - `fp16`: False
694
+ - `fp16_opt_level`: O1
695
+ - `half_precision_backend`: auto
696
+ - `bf16_full_eval`: False
697
+ - `fp16_full_eval`: False
698
+ - `tf32`: None
699
+ - `local_rank`: 0
700
+ - `ddp_backend`: None
701
+ - `tpu_num_cores`: None
702
+ - `tpu_metrics_debug`: False
703
+ - `debug`: []
704
+ - `dataloader_drop_last`: False
705
+ - `dataloader_num_workers`: 0
706
+ - `dataloader_prefetch_factor`: None
707
+ - `past_index`: -1
708
+ - `disable_tqdm`: False
709
+ - `remove_unused_columns`: True
710
+ - `label_names`: None
711
+ - `load_best_model_at_end`: True
712
+ - `ignore_data_skip`: False
713
+ - `fsdp`: []
714
+ - `fsdp_min_num_params`: 0
715
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
716
+ - `fsdp_transformer_layer_cls_to_wrap`: None
717
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
718
+ - `deepspeed`: None
719
+ - `label_smoothing_factor`: 0.0
720
+ - `optim`: adamw_torch
721
+ - `optim_args`: None
722
+ - `adafactor`: False
723
+ - `group_by_length`: False
724
+ - `length_column_name`: length
725
+ - `ddp_find_unused_parameters`: None
726
+ - `ddp_bucket_cap_mb`: None
727
+ - `ddp_broadcast_buffers`: False
728
+ - `dataloader_pin_memory`: True
729
+ - `dataloader_persistent_workers`: False
730
+ - `skip_memory_metrics`: True
731
+ - `use_legacy_prediction_loop`: False
732
+ - `push_to_hub`: False
733
+ - `resume_from_checkpoint`: None
734
+ - `hub_model_id`: None
735
+ - `hub_strategy`: every_save
736
+ - `hub_private_repo`: False
737
+ - `hub_always_push`: False
738
+ - `gradient_checkpointing`: False
739
+ - `gradient_checkpointing_kwargs`: None
740
+ - `include_inputs_for_metrics`: False
741
+ - `eval_do_concat_batches`: True
742
+ - `fp16_backend`: auto
743
+ - `push_to_hub_model_id`: None
744
+ - `push_to_hub_organization`: None
745
+ - `mp_parameters`:
746
+ - `auto_find_batch_size`: False
747
+ - `full_determinism`: False
748
+ - `torchdynamo`: None
749
+ - `ray_scope`: last
750
+ - `ddp_timeout`: 1800
751
+ - `torch_compile`: False
752
+ - `torch_compile_backend`: None
753
+ - `torch_compile_mode`: None
754
+ - `dispatch_batches`: None
755
+ - `split_batches`: None
756
+ - `include_tokens_per_second`: False
757
+ - `include_num_input_tokens_seen`: False
758
+ - `neftune_noise_alpha`: None
759
+ - `optim_target_modules`: None
760
+ - `batch_eval_metrics`: False
761
+ - `eval_on_start`: False
762
+ - `batch_sampler`: batch_sampler
763
+ - `multi_dataset_batch_sampler`: proportional
764
+
765
+ </details>
766
+
767
+ ### Training Logs
768
+ | Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
769
+ |:-------:|:-------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:|:----------------------:|
770
+ | 0.2 | 5 | 5.2225 | - | - | - | - | - |
771
+ | 0.4 | 10 | 4.956 | - | - | - | - | - |
772
+ | 0.6 | 15 | 3.6388 | - | - | - | - | - |
773
+ | 0.8 | 20 | 3.1957 | - | - | - | - | - |
774
+ | 1.0 | 25 | 2.6928 | 0.8661 | 0.8770 | 0.8754 | 0.8312 | 0.8871 |
775
+ | 1.2 | 30 | 2.5565 | - | - | - | - | - |
776
+ | 1.4 | 35 | 1.5885 | - | - | - | - | - |
777
+ | 1.6 | 40 | 2.1406 | - | - | - | - | - |
778
+ | 1.8 | 45 | 2.193 | - | - | - | - | - |
779
+ | 2.0 | 50 | 1.326 | 0.8944 | 0.9110 | 0.9028 | 0.8615 | 0.9037 |
780
+ | 2.2 | 55 | 2.6832 | - | - | - | - | - |
781
+ | 2.4 | 60 | 1.0584 | - | - | - | - | - |
782
+ | 2.6 | 65 | 0.8853 | - | - | - | - | - |
783
+ | 2.8 | 70 | 1.7129 | - | - | - | - | - |
784
+ | 3.0 | 75 | 2.1856 | 0.9106 | 0.9293 | 0.9075 | 0.8778 | 0.9266 |
785
+ | 3.2 | 80 | 1.7658 | - | - | - | - | - |
786
+ | 3.4 | 85 | 1.9783 | - | - | - | - | - |
787
+ | 3.6 | 90 | 1.9583 | - | - | - | - | - |
788
+ | 3.8 | 95 | 1.2396 | - | - | - | - | - |
789
+ | 4.0 | 100 | 1.1901 | 0.9073 | 0.9253 | 0.9151 | 0.8750 | 0.9312 |
790
+ | 4.2 | 105 | 2.6547 | - | - | - | - | - |
791
+ | 4.4 | 110 | 1.3485 | - | - | - | - | - |
792
+ | 4.6 | 115 | 1.0767 | - | - | - | - | - |
793
+ | 4.8 | 120 | 0.6663 | - | - | - | - | - |
794
+ | **5.0** | **125** | **1.3869** | **0.9099** | **0.9227** | **0.9221** | **0.8713** | **0.9264** |
795
+
796
+ * The bold row denotes the saved checkpoint.
797
+
798
+ ### Framework Versions
799
+ - Python: 3.10.12
800
+ - Sentence Transformers: 3.0.1
801
+ - Transformers: 4.42.4
802
+ - PyTorch: 2.3.1+cu121
803
+ - Accelerate: 0.32.1
804
+ - Datasets: 2.21.0
805
+ - Tokenizers: 0.19.1
806
+
807
+ ## Citation
808
+
809
+ ### BibTeX
810
+
811
+ #### Sentence Transformers
812
+ ```bibtex
813
+ @inproceedings{reimers-2019-sentence-bert,
814
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
815
+ author = "Reimers, Nils and Gurevych, Iryna",
816
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
817
+ month = "11",
818
+ year = "2019",
819
+ publisher = "Association for Computational Linguistics",
820
+ url = "https://arxiv.org/abs/1908.10084",
821
+ }
822
+ ```
823
+
824
+ #### MatryoshkaLoss
825
+ ```bibtex
826
+ @misc{kusupati2024matryoshka,
827
+ title={Matryoshka Representation Learning},
828
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
829
+ year={2024},
830
+ eprint={2205.13147},
831
+ archivePrefix={arXiv},
832
+ primaryClass={cs.LG}
833
+ }
834
+ ```
835
+
836
+ #### MultipleNegativesRankingLoss
837
+ ```bibtex
838
+ @misc{henderson2017efficient,
839
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
840
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
841
+ year={2017},
842
+ eprint={1705.00652},
843
+ archivePrefix={arXiv},
844
+ primaryClass={cs.CL}
845
+ }
846
+ ```
847
+
848
+ <!--
849
+ ## Glossary
850
+
851
+ *Clearly define terms in order to be accessible across audiences.*
852
+ -->
853
+
854
+ <!--
855
+ ## Model Card Authors
856
+
857
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
858
+ -->
859
+
860
+ <!--
861
+ ## Model Card Contact
862
+
863
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
864
+ -->
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "fine-tuned-models/fine-tuned-matryoshka-200",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-12,
21
+ "max_position_embeddings": 512,
22
+ "model_type": "bert",
23
+ "num_attention_heads": 12,
24
+ "num_hidden_layers": 12,
25
+ "pad_token_id": 0,
26
+ "position_embedding_type": "absolute",
27
+ "torch_dtype": "float32",
28
+ "transformers_version": "4.42.4",
29
+ "type_vocab_size": 2,
30
+ "use_cache": true,
31
+ "vocab_size": 30522
32
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.42.4",
5
+ "pytorch": "2.3.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8f444921efaebd00d38ea538f488f678edc38328e7382766665e882e6f00156
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff