davidadamczyk commited on
Commit
079b2d2
1 Parent(s): c4ec8fe

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,271 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: sentence-transformers/all-mpnet-base-v2
3
+ library_name: setfit
4
+ metrics:
5
+ - accuracy
6
+ pipeline_tag: text-classification
7
+ tags:
8
+ - setfit
9
+ - sentence-transformers
10
+ - text-classification
11
+ - generated_from_setfit_trainer
12
+ widget:
13
+ - text: 'I have alcoholism, drug abouse and suicide all over my family as far back
14
+ as three generations. After seeing several friend in college (class of ''76, U
15
+ of Arkansas, Go Hogs!) get blind drunk and raped at frat parties, I decided I
16
+ could live without it. And I have -- even through five years active duty in the
17
+ army. I cook with wine and my husband likes a daily beer in the summer. I haven''t
18
+ missed a thing, I''m height-weight proportionate and probably a few pesos richer
19
+ for not having squandered money on booze. I live outside the US and I''ve seen
20
+ dozens of women battered beyond recognition by drunk husbands, children neglected
21
+ by their parents almost to a point of starvation, and families ruptured and ruined
22
+ by alcohol. It ain''t worth it.
23
+
24
+ '
25
+ - text: 'The War Between the Catholic Cardinals Two essays make plain the different
26
+ views often obscured by careful political maneuvering within the church. The death
27
+ of the pope emeritus, Benedict XVI, was succeeded by a small literary outpouring,
28
+ a rush of publications that were interpreted as salvos in the Catholic Church’s
29
+ civil war. The list includes a memoir by Benedict’s longtime secretary that mentioned
30
+ the former pontiff’s disappointment at his successor’s restriction of the Latin
31
+ Mass, a posthumous essay collection by Benedict himself that’s being mined for
32
+ controversial quotes, and an Associated Press interview with Pope Francis that
33
+ made news for its call to decriminalize homosexuality around the world. Two essays
34
+ make plain the different views often obscured by careful political maneuvering
35
+ within the church.
36
+
37
+ '
38
+ - text: '"As one of the 100,000 or so Catholics in this country who attend the old
39
+ Mass each week, I will always be grateful to him for allowing for its widespread
40
+ celebration despite the promulgation of a new, vernacular liturgy."This really
41
+ says it all. "Soren Kierkegaard?"? Really? Mr. Walther may be the editor of "The
42
+ Lamp," but his lamp sheds no light on Ratzinger or the fundamental evils of the
43
+ continuous and painfully slow downward spiral that has been the trajectory of
44
+ the papacy and the Roman Catholic Church for a very long time.Vatican II opened
45
+ up a great hope. Ratzinger and his followers saw in it only a threat to the cult
46
+ of secrecy, both of the sacraments, and the sins. They have done much to unravel
47
+ the all of the inherent good of Vatican II -- which actually made Catholicism
48
+ interesting and meaningful to youths at a time of great cynicism in the world.
49
+ Walther and his 100,000 should form their own 4th century Catholic schism, "despite
50
+ the promulgation of a new, vernacular liturgy," and leave what''s left of the
51
+ Catholic church alone to re-build.
52
+
53
+ '
54
+ - text: 'Benedict, the reluctant popeThe former Cardinal Ratzinger had never wanted
55
+ to be pope, planning at age 78 to spend his final years writing in the “peace
56
+ and quiet” of his native Bavaria.Instead, he was forced to follow the footsteps
57
+ of the beloved St. John Paul II and run the church through the fallout of the
58
+ clerical sex abuse scandal Being elected pope, he once said, felt like a “guillotine”
59
+ had come down on him. Nevertheless, he set about the job with a single-minded
60
+ vision to rekindle the faith in a world that, he frequently lamented, seemed to
61
+ think it could do without God.“In vast areas of the world today, there is a strange
62
+ forgetfulness of God,” he told one million young people gathered on a vast field
63
+ for his first foreign trip as pope, to World Youth Day in Cologne, Germany, in
64
+ 2005. “It seems as if everything would be just the same even without him.”With
65
+ some decisive, .. he tried to remind Europe of its Christian heritage. And he
66
+ set the Catholic Church on a conservative, tradition-minded path that often alienated
67
+ progressives. He relaxed the restrictions on celebrating the old Latin Mass. It
68
+ was a path that in many ways was reversed by his successor, Francis, whose mercy-over-morals
69
+ priorities alienated the traditionalists Benedict’s style couldn’t have been more
70
+ different from that of Francis. No globe-trotting media darling or populist, Benedict
71
+ was a teacher, theologian and academic to the core: quiet and pensive with a fierce
72
+ mind. El Pais Dec
73
+
74
+ '
75
+ - text: 'Willy Stone Wouldn''t it be a pity if all ancient art could only be seen
76
+ in the location where it was made?
77
+
78
+ '
79
+ inference: true
80
+ model-index:
81
+ - name: SetFit with sentence-transformers/all-mpnet-base-v2
82
+ results:
83
+ - task:
84
+ type: text-classification
85
+ name: Text Classification
86
+ dataset:
87
+ name: Unknown
88
+ type: unknown
89
+ split: test
90
+ metrics:
91
+ - type: accuracy
92
+ value: 1.0
93
+ name: Accuracy
94
+ ---
95
+
96
+ # SetFit with sentence-transformers/all-mpnet-base-v2
97
+
98
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
99
+
100
+ The model has been trained using an efficient few-shot learning technique that involves:
101
+
102
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
103
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
104
+
105
+ ## Model Details
106
+
107
+ ### Model Description
108
+ - **Model Type:** SetFit
109
+ - **Sentence Transformer body:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
110
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
111
+ - **Maximum Sequence Length:** 384 tokens
112
+ - **Number of Classes:** 2 classes
113
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
114
+ <!-- - **Language:** Unknown -->
115
+ <!-- - **License:** Unknown -->
116
+
117
+ ### Model Sources
118
+
119
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
120
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
121
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
122
+
123
+ ### Model Labels
124
+ | Label | Examples |
125
+ |:------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
126
+ | yes | <ul><li>'The First Afterlife of Pope Benedict XVI Pope Benedict’s legacy will be felt across decades or even centuries. The first pope to resign was Celestine V, born Pietro Da Morrone, who was living the life of a pious hermit when he was elevated to the papacy in 1294, in his 80s, to break a two-year deadlock in the College of Cardinals. Feeling overmastered by the job, he soon resigned in the expectation that he could return to his monastic existence. Instead, he was imprisoned by his successor, Boniface VIII, who feared that some rival faction might make Celestine an antipope. Pope Benedict’s legacy will be felt across decades or even centuries.\n'</li><li>'Here is the statement on Ratzinger\'s death from SNAP, an organization representing victims of abuse from the Catholic Church: "In our view, the death of Pope Benedict XVI is a reminder that, much like John Paul II, Benedict was more concerned about the church’s deteriorating image and financial flow to the hierarchy versus grasping the concept of true apologies followed by true amends to victims of abuse. The rot of clergy sexual abuse of children and adults, even their own professed religious, runs throughout the Catholic church, to every country, and we now have incontrovertible evidence, all the way to the top.Any celebration that marks the life of abuse enablers like Benedict must end. It is past time for the Vatican to refocus on change: tell the truth about known abusive clergy, protect children and adults, and allow justice to those who have been hurt. Honoring Pope Benedict XVI now is not only wrong. It is shameful.It is almost a year after a report into decades of abuse allegations by a law firm in Germany has shown that Pope Benedict XVI did not take action against abusive priests in four child abuse cases while he was Archbishop (Josef Ratzinger). In our view, Pope Benedict XVI is taking decades of the church’s darkest secrets to his grave with him..."<a href="https://www.snapnetwork.org/snap_reacts_to_the_death_of_pope_benedict_xvi" target="_blank">https://www.snapnetwork.org/snap_reacts_to_the_death_of_pope_benedict_xvi</a>\n'</li><li>'I found the statement "While Benedict felt that celebrations of the new Mass were frequently unedifying and even banal..." to be flawed when compared to the post-Vatican II church in the USA. Where was Benedict when Catholic churches in the US held "folk masses" using guitars and keyboard instruments instead of organs? My confirmation ceremony in 1970 in NJ was held that way and one I remember clearly to this very day.If Benedict was really looking for the "cosmic" dimension of the liturgy, maybe he should have attended Maronite Catholic Masses on a regular basis. At the very least, he would have observed the Maronites\' liturgical rituals which date back into the First Century A.D., not to mention the use of Aramaic, the language of Christ, during the consecration of the Eucharist.\n'</li></ul> |
127
+ | no | <ul><li>'I’m not a Rogan fan, but this is just a few adventurers adventuring. As long as they are experienced divers with the proper equipment and aware of the dangers, who knows what they might find at the bottom of the East River? Inquiring minds want to know.\n'</li><li>'Mike DiNovi One thing we can agree on is that, depending on our individual backgrounds, there are for each of us, a whole treasure trove of "missing" words. Often but not always, these words may be found in the crosswords, particularly older crosswords. But to ask for all of them to be included here would be asking to change the whole gestalt, as well as immodest. But I think longtime players of the Bee still see some value and delight in posting missing words. It may not be included in the official list but the Hive gives them currency. I personally enjoy all the "missing" word posts, however redundant they often are. I find in them some commonality and I have learned also probably several dozen new words - unofficial and official - including many chemistry words. I was a lousy chemistry student but I absolutely love the vocabulary of it.\n'</li><li>'If "work on what comes next" and "innovation" means expanding the definition of "life" to mean all stages of life from womb to tomb (e.g., paternal leave, pre-school, school choice, basic universal healthcare, baby bonds, etc.) that would be a positive and hopefully inclusive step forward that might bridge the awful divide we see across the country and even in the comments of this article.\n'</li></ul> |
128
+
129
+ ## Evaluation
130
+
131
+ ### Metrics
132
+ | Label | Accuracy |
133
+ |:--------|:---------|
134
+ | **all** | 1.0 |
135
+
136
+ ## Uses
137
+
138
+ ### Direct Use for Inference
139
+
140
+ First install the SetFit library:
141
+
142
+ ```bash
143
+ pip install setfit
144
+ ```
145
+
146
+ Then you can load this model and run inference.
147
+
148
+ ```python
149
+ from setfit import SetFitModel
150
+
151
+ # Download from the 🤗 Hub
152
+ model = SetFitModel.from_pretrained("davidadamczyk/setfit-model-10")
153
+ # Run inference
154
+ preds = model("Willy Stone Wouldn't it be a pity if all ancient art could only be seen in the location where it was made?
155
+ ")
156
+ ```
157
+
158
+ <!--
159
+ ### Downstream Use
160
+
161
+ *List how someone could finetune this model on their own dataset.*
162
+ -->
163
+
164
+ <!--
165
+ ### Out-of-Scope Use
166
+
167
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
168
+ -->
169
+
170
+ <!--
171
+ ## Bias, Risks and Limitations
172
+
173
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
174
+ -->
175
+
176
+ <!--
177
+ ### Recommendations
178
+
179
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
180
+ -->
181
+
182
+ ## Training Details
183
+
184
+ ### Training Set Metrics
185
+ | Training set | Min | Median | Max |
186
+ |:-------------|:----|:--------|:----|
187
+ | Word count | 15 | 123.625 | 286 |
188
+
189
+ | Label | Training Sample Count |
190
+ |:------|:----------------------|
191
+ | no | 18 |
192
+ | yes | 22 |
193
+
194
+ ### Training Hyperparameters
195
+ - batch_size: (16, 16)
196
+ - num_epochs: (1, 1)
197
+ - max_steps: -1
198
+ - sampling_strategy: oversampling
199
+ - num_iterations: 120
200
+ - body_learning_rate: (2e-05, 2e-05)
201
+ - head_learning_rate: 2e-05
202
+ - loss: CosineSimilarityLoss
203
+ - distance_metric: cosine_distance
204
+ - margin: 0.25
205
+ - end_to_end: False
206
+ - use_amp: False
207
+ - warmup_proportion: 0.1
208
+ - l2_weight: 0.01
209
+ - seed: 42
210
+ - eval_max_steps: -1
211
+ - load_best_model_at_end: False
212
+
213
+ ### Training Results
214
+ | Epoch | Step | Training Loss | Validation Loss |
215
+ |:------:|:----:|:-------------:|:---------------:|
216
+ | 0.0017 | 1 | 0.365 | - |
217
+ | 0.0833 | 50 | 0.1213 | - |
218
+ | 0.1667 | 100 | 0.0018 | - |
219
+ | 0.25 | 150 | 0.0004 | - |
220
+ | 0.3333 | 200 | 0.0002 | - |
221
+ | 0.4167 | 250 | 0.0002 | - |
222
+ | 0.5 | 300 | 0.0001 | - |
223
+ | 0.5833 | 350 | 0.0001 | - |
224
+ | 0.6667 | 400 | 0.0001 | - |
225
+ | 0.75 | 450 | 0.0001 | - |
226
+ | 0.8333 | 500 | 0.0001 | - |
227
+ | 0.9167 | 550 | 0.0001 | - |
228
+ | 1.0 | 600 | 0.0001 | - |
229
+
230
+ ### Framework Versions
231
+ - Python: 3.10.13
232
+ - SetFit: 1.1.0
233
+ - Sentence Transformers: 3.0.1
234
+ - Transformers: 4.45.2
235
+ - PyTorch: 2.4.0+cu124
236
+ - Datasets: 2.21.0
237
+ - Tokenizers: 0.20.0
238
+
239
+ ## Citation
240
+
241
+ ### BibTeX
242
+ ```bibtex
243
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
244
+ doi = {10.48550/ARXIV.2209.11055},
245
+ url = {https://arxiv.org/abs/2209.11055},
246
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
247
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
248
+ title = {Efficient Few-Shot Learning Without Prompts},
249
+ publisher = {arXiv},
250
+ year = {2022},
251
+ copyright = {Creative Commons Attribution 4.0 International}
252
+ }
253
+ ```
254
+
255
+ <!--
256
+ ## Glossary
257
+
258
+ *Clearly define terms in order to be accessible across audiences.*
259
+ -->
260
+
261
+ <!--
262
+ ## Model Card Authors
263
+
264
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
265
+ -->
266
+
267
+ <!--
268
+ ## Model Card Contact
269
+
270
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
271
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-mpnet-base-v2",
3
+ "architectures": [
4
+ "MPNetModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "mpnet",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "relative_attention_num_buckets": 32,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.45.2",
23
+ "vocab_size": 30527
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.45.2",
5
+ "pytorch": "2.4.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
config_setfit.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "normalize_embeddings": false,
3
+ "labels": [
4
+ "no",
5
+ "yes"
6
+ ]
7
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:acf0e4c3ff98b99796ff63f901b966b7f2c1bab776f52290682de1b0ed24d6b3
3
+ size 437967672
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8f138402e3ef1afe4f71f1b5c77a69a3a7d93f5972cb66a77dfb0f38ac5e5925
3
+ size 7023
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 384,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": false,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "mask_token": "<mask>",
58
+ "max_length": 128,
59
+ "model_max_length": 384,
60
+ "pad_to_multiple_of": null,
61
+ "pad_token": "<pad>",
62
+ "pad_token_type_id": 0,
63
+ "padding_side": "right",
64
+ "sep_token": "</s>",
65
+ "stride": 0,
66
+ "strip_accents": null,
67
+ "tokenize_chinese_chars": true,
68
+ "tokenizer_class": "MPNetTokenizer",
69
+ "truncation_side": "right",
70
+ "truncation_strategy": "longest_first",
71
+ "unk_token": "[UNK]"
72
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff