TomatenMarc commited on
Commit
0cee83e
1 Parent(s): 62fd985

Delete WRAPresentations

Browse files
WRAPresentations/1_Pooling/config.json DELETED
@@ -1,7 +0,0 @@
1
- {
2
- "word_embedding_dimension": 768,
3
- "pooling_mode_cls_token": false,
4
- "pooling_mode_mean_tokens": true,
5
- "pooling_mode_max_tokens": false,
6
- "pooling_mode_mean_sqrt_len_tokens": false
7
- }
 
 
 
 
 
 
 
 
WRAPresentations/README.md DELETED
@@ -1,258 +0,0 @@
1
- ---
2
- pipeline_tag: sentence-similarity
3
- tags:
4
- - sentence-transformers
5
- - feature-extraction
6
- - sentence-similarity
7
- - transformers
8
- - argument-mining
9
- - Twitter
10
- metrics:
11
- - macro-F1
12
- license: cc-by-sa-4.0
13
- language:
14
- - en
15
- widget:
16
- - source_sentence: "The formula: Not everyone who voted Leave is racist. But everyone who's racist voted Leave. Not everyone who voted Leave is thick. But everyone who's thick voted Leave. The thick racists therefore called the shots, whatever the thoughts of the minority of others. #thick #Brexit"
17
- sentences:
18
- - "Men shouldn’t be making laws about women’s bodies #abortion #Texas"
19
- - "Opinion: As the draconian (and then some) abortion law takes effecting #Texas, this is not an idle question for millions of Americans. A slippery slope towards more like-minded Republican state-legislatures to try to follow suit. #abortion #F24 HTTPURL"
20
- - "’Bitter truth’: EU chief pours cold water on idea of Brits keeping EU citizenship after #Brexit HTTPURL via @USER"
21
- - "@USER Blah blah blah blah blah blah"
22
- example_title: "Reason"
23
-
24
- - source_sentence: "This is NOT good for children."
25
- sentences:
26
- - "Men shouldn’t be making laws about women’s bodies #abortion #Texas"
27
- - "Opinion: As the draconian (and then some) abortion law takes effecting #Texas, this is not an idle question for millions of Americans. A slippery slope towards more like-minded Republican state-legislatures to try to follow suit. #abortion #F24 HTTPURL"
28
- - "’Bitter truth’: EU chief pours cold water on idea of Brits keeping EU citizenship after #Brexit HTTPURL via @USER"
29
- - "@USER Blah blah blah blah blah blah"
30
- example_title: "Statement"
31
-
32
- - source_sentence: "Elon Musk ready with 'Plan B' if Twitter rejects his offer Read @USER Story | HTTPURL #ElonMusk #ElonMuskTwitter #TwitterTakeover HTTPURL"
33
- sentences:
34
- - "Men shouldn’t be making laws about women’s bodies #abortion #Texas"
35
- - "Opinion: As the draconian (and then some) abortion law takes effecting #Texas, this is not an idle question for millions of Americans. A slippery slope towards more like-minded Republican state-legislatures to try to follow suit. #abortion #F24 HTTPURL"
36
- - "’Bitter truth’: EU chief pours cold water on idea of Brits keeping EU citizenship after #Brexit HTTPURL via @USER"
37
- - "@USER Blah blah blah blah blah blah"
38
- example_title: "Notification"
39
-
40
- - source_sentence: "@USER 👅is the Key 😂"
41
- sentences:
42
- - "Men shouldn’t be making laws about women’s bodies #abortion #Texas"
43
- - "Opinion: As the draconian (and then some) abortion law takes effecting #Texas, this is not an idle question for millions of Americans. A slippery slope towards more like-minded Republican state-legislatures to try to follow suit. #abortion #F24 HTTPURL"
44
- - "’Bitter truth’: EU chief pours cold water on idea of Brits keeping EU citizenship after #Brexit HTTPURL via @USER"
45
- - "@USER Blah blah blah blah blah blah"
46
- example_title: "None"
47
- ---
48
-
49
- # WRAPresentations -- A TACO-based Embedder For Inference and Information-Driven Argument Mining on Twitter
50
-
51
- Introducing WRAPresentations, a cutting-edge [sentence-transformers](https://www.SBERT.net) model that leverages the power of a 768-dimensional dense
52
- vector space to map tweets according to the four classes Reason, Statement, Notification and None. This powerful model is tailored for
53
- argument mining on Twitter, derived from the [BERTweet-base](https://huggingface.co/vinai/bertweet-base) architecture initially pre-trained on
54
- Twitter data. Through fine-tuning with the [TACO](https://doi.org/10.5281/zenodo.8030026) dataset, WRAPresentations is effectively in encoding
55
- inference and information in tweets.
56
-
57
- ## Class Semantics
58
- The TACO framework revolves around the two key elements of an argument, as defined by the [Cambridge Dictionary](https://dictionary.cambridge.org).
59
- It encodes *inference* as *a guess that you make or an opinion that you form based on the information that you have*, and it also leverages the
60
- definition of *information* as *facts or details about a person, company, product, etc.*.
61
-
62
- WRAPresentations, to some degree, captures the semantics of these critical components in its embedding space.
63
-
64
- Consequently, it has also learned the class semantics, where inferences and information can be aggregated in relation to these distinct
65
- classes containing these components:
66
-
67
- * *Statement*, which refers to unique cases where only the *inference* is presented as *something that someone says or writes officially, or an action
68
- done to express an opinion*.
69
- * *Reason*, which represents a full argument where the *inference* is based on direct *information* mentioned in the tweet, such as a source-reference
70
- or quotation, and thus reveals the author’s motivation *to try to understand and to make judgments based on practical facts*.
71
- * *Notification*, which refers to a tweet that limits itself to providing *information*, such as media channels promoting their latest articles.
72
- * *None*, a tweet that provides neither *inference* nor *information*.
73
-
74
- In its entirety, WRAPresentations encodes the following hierarchy for tweets:
75
-
76
- <div align="center">
77
- <img src="https://github.com/TomatenMarc/public-images/raw/main/Argument_Tree.svg" alt="Component Space" width="100%">
78
- </div>
79
-
80
- ## Class Semantic Transfer to Embeddings
81
-
82
- Observing the tweet distribution given `CLS` tokens for later classification within the embedding space of WRAPresentations, we noted that
83
- pre-classification fine-tuning via contrastive learning led to denser emergence of the expected class sectors compared to the embeddings of BERTweet,
84
- as shown in the following figure.
85
- <div align="center">
86
- <img src="https://github.com/TomatenMarc/public-images/raw/main/sector_purity_coordinates.svg" alt="Argument Tree" width="100%">
87
- </div>
88
-
89
-
90
- ## Usage (Sentence-Transformers)
91
-
92
- Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
93
-
94
- ```
95
- pip install -U sentence-transformers
96
- ```
97
-
98
- Then you can use the model to generate tweet representations like this:
99
-
100
- ```python
101
- from sentence_transformers import SentenceTransformer
102
-
103
- tweets = ["This is an example #tweet", "Each tweet is converted"]
104
-
105
- model = SentenceTransformer("TomatenMarc/WRAPresentations")
106
- embeddings = model.encode(tweets)
107
- print(embeddings)
108
- ```
109
-
110
- <a href="https://github.com/VinAIResearch/BERTweet/blob/master/TweetNormalizer.py">
111
- <blockquote style="border-left: 5px solid grey; background-color: #f0f5ff; padding: 10px;">
112
- Notice: The tweets need to undergo preprocessing following the specifications for BERTweet-base.
113
- </blockquote>
114
- </a>
115
-
116
- ## Usage (HuggingFace Transformers)
117
-
118
- Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model,
119
- then you have to apply the right pooling-operation on top of the contextualized word embeddings.
120
-
121
- ```python
122
- from transformers import AutoTokenizer, AutoModel
123
- import torch
124
-
125
-
126
- # Mean Pooling - Take attention mask into account for correct averaging
127
- def mean_pooling(model_output, attention_mask):
128
- token_embeddings = model_output[0] # First element of model_output contains all token embeddings
129
- input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
130
- return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
131
-
132
-
133
- # Tweets we want embeddings for
134
- tweets = ["This is an example #tweet", "Each tweet is converted"]
135
-
136
- # Load model from HuggingFace Hub
137
- tokenizer = AutoTokenizer.from_pretrained("TomatenMarc/WRAPresentations")
138
- model = AutoModel.from_pretrained("TomatenMarc/WRAPresentations")
139
-
140
- # Tokenize sentences
141
- encoded_input = tokenizer(tweets, padding=True, truncation=True, return_tensors="pt")
142
-
143
- # Compute token embeddings
144
- with torch.no_grad():
145
- model_output = model(**encoded_input)
146
-
147
- # Perform pooling. In this case, mean pooling.
148
- sentence_embeddings = mean_pooling(model_output, encoded_input["attention_mask"])
149
-
150
- print("Sentence embeddings:")
151
- print(sentence_embeddings)
152
- ```
153
-
154
- Furthermore, the WRAPresentations model is a highly suitable embedding component for `AutoModelForSequenceClassification`, enabling
155
- further fine-tuning of tweet classification tasks specifically for the four classes: Reason, Statement, Notification, and None.
156
- The categorization of Reason and Statement as argument classes and Notification and None as non-argument classes is implicitly learned during
157
- the fine-tuning process. This setup facilitates efficient identification and analysis of argumentative content and non-argumentative content in tweets.
158
-
159
- ## Training
160
-
161
- The WRAPresentations model underwent fine-tuning with 1,219 golden tweets from the [TACO](https://doi.org/10.5281/zenodo.8030026) dataset, covering six topics.
162
- Five topics were chosen for optimization, representing 925 tweets (75.88%) covering #brexit (33.3%), #got (17%), #lotrrop (18.8%), #squidgame (17.1%),
163
- and #twittertakeover (13.8%). The model used a stratified 60/40 split for training/testing on optimization data.
164
- Additionally, 294 golden tweets (24.12%) related to the topic of #abortion were chosen as the holdout-set for final evaluation.
165
-
166
- Before fine-tuning, we built a copy of the dataset by creating an augmentation of each tweet. The augmentation consisted of replacing all the
167
- topic words and entities in a tweet replaced, and then randomly masking 10% of the words in a tweet, which were then matched using
168
- [BERTweet-base](https://huggingface.co/vinai/bertweet-base) as a `fill-mask` model. We chose to omit 10% of the words because this resulted in the
169
- smallest possible average cosine distance between the tweets and their augmentations of 0.02, which is close to dissimilarity, making
170
- augmentation during pre-classification fine-tuning itself a regulating factor prior to any overfitting with the later test data.
171
- During fine-tuning, we formed pairs by matching each tweet with all remaining tweets in the same data split (training, testing, holdout)
172
- with similar or dissimilar class labels. For the training and testing set during the fine-tuning process, we utilized the augmentations, and for the
173
- holdout tweets, we used their original text to test the fine-tuning process and the usefulness of the augmentations towards real tweets.
174
- For all pairs, we chose the largest possible set so that both similar and dissimilar pairs are equally represented while covering all tweets
175
- of the respective data split.
176
- This process created 307,470 pairs for training and 136,530 pairs for testing. An additional 86,142 pairs were used for final evaluation with the
177
- holdout data. Moreover, we utilized `MEAN` pooling, enhancing sentence representations, for fine-tuning.
178
-
179
- The model was trained with the parameters:
180
-
181
- **DataLoader**:
182
-
183
- `torch.utils.data.dataloader.DataLoader` of length 5065 with parameters:
184
-
185
- ```
186
- {'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
187
- ```
188
-
189
- **Loss**:
190
-
191
- `sentence_transformers.losses.ContrastiveLoss.ContrastiveLoss` with parameters:
192
-
193
- ```
194
- {'distance_metric': 'SiameseDistanceMetric.COSINE_DISTANCE', 'margin': 0.5, 'size_average': True}
195
- ```
196
-
197
- Parameters of the fit()-Method:
198
-
199
- ```
200
- {
201
- "epochs": 5,
202
- "evaluation_steps": 1000,
203
- "evaluator": "sentence_transformers.evaluation.BinaryClassificationEvaluator.BinaryClassificationEvaluator",
204
- "max_grad_norm": 1,
205
- "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
206
- "optimizer_params": {
207
- "lr": 4e-05
208
- },
209
- "scheduler": "WarmupLinear",
210
- "steps_per_epoch": null,
211
- "warmup_steps": 2533,
212
- "weight_decay": 0.01
213
- }
214
- ```
215
-
216
- ## Evaluation Results
217
-
218
- Following the [standard protocol](https://aclanthology.org/D17-1218.pdf) for cross-topic evaluation for argument mining, we evaluated the
219
- WRAPresentation model using the `BinaryClassificationEvaluator` of SBERT with standard `CLS` tokens for classification showing:
220
-
221
-
222
- | Model | Precision | Recall | F1 | Support |
223
- |-----------------------------------------|-----------|---------|--------|---------|
224
- | Vanilla BERTweet-`CLS` | 50.00% | 100.00% | 66.67% | 86,142 |
225
- | Augmented BERTweet-`CLS` | 66.75% | 84.78% | 74.69% | 86,142 |
226
- | WRAPresentations-`CLS` | 66.00% | 84.32% | 74.04% | 86,142 |
227
- | WRAPresentations-`MEAN` (current model) | 63.05% | 88.91% | 73.78% | 86,142 |
228
-
229
- An evaluation was conducted on previously unseen data from the holdout topic #abortion, resulting in the model achieving a passive macro-F1
230
- score of 73.78% when evaluated with `CLS` tokens and 74.07% F1, when evaluated with `MEAN` pooling as used for fine-tuning.
231
- The recall, which stands at 88.91%, indicates the model's ability to capture subtle tweet patterns and class-specific features for
232
- Reason, Statement, Notification, and None. As reference, we report the results for Vanilla BERTweet-`CLS`, which a plain BERTweet-base model, for
233
- Augmented BERTweet-`CLS`, which was trained on the same augmentations as WRAPresentations-`MEAN` but directly optimizing on the classification task, and
234
- WRAPresentations-`MEAN`, which is the same model as the presented model but with `CLS` pooling during fine-tuning.
235
-
236
- ## Full Model Architecture
237
- <div align="center">
238
- <img src="https://github.com/TomatenMarc/public-images/raw/main/contrastive_siamese_network.svg" alt="Argument Tree" width="100%">
239
- </div>
240
-
241
- ```
242
- SentenceTransformer(
243
- (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: RobertaModel
244
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
245
- )
246
- ```
247
-
248
- # Environmental Impact
249
-
250
- - **Hardware Type:** A100 PCIe 40GB
251
- - **Hours used:** 2h
252
- - **Cloud Provider:** [Google Cloud Platform](https://colab.research.google.com)
253
- - **Compute Region:** [asia-southeast1](https://cloud.google.com/compute/docs/gpus/gpu-regions-zones?hl=en) (Singapore)
254
- - **Carbon Emitted:** 0.21kg CO2
255
-
256
- ## Licensing
257
-
258
- [WRAPresentations](https://huggingface.co/TomatenMarc/WRAPresentations) © 2023 is licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/?ref=chooser-v1).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
WRAPresentations/added_tokens.json DELETED
@@ -1,3 +0,0 @@
1
- {
2
- "<mask>": 64000
3
- }
 
 
 
 
WRAPresentations/binary_classification_evaluation_bertweet-cls_results.csv DELETED
@@ -1,2 +0,0 @@
1
- epoch,steps,cossim_accuracy,cossim_accuracy_threshold,cossim_f1,cossim_precision,cossim_recall,cossim_f1_threshold,cossim_ap,manhattan_accuracy,manhattan_accuracy_threshold,manhattan_f1,manhattan_precision,manhattan_recall,manhattan_f1_threshold,manhattan_ap,euclidean_accuracy,euclidean_accuracy_threshold,euclidean_f1,euclidean_precision,euclidean_recall,euclidean_f1_threshold,euclidean_ap,dot_accuracy,dot_accuracy_threshold,dot_f1,dot_precision,dot_recall,dot_f1_threshold,dot_ap
2
- -1,-1,0.5120985810306199,0.9266897439956665,0.6666749648365052,0.5000093354991691,1.0,0.8145368099212646,0.5230522554221206,0.5129574309185959,77.43867492675781,0.6666832632129254,0.500018671346951,1.0,117.4425048828125,0.5246559193500177,0.5135362210604929,3.549988031387329,0.6666749648365052,0.5000093354991691,1.0,5.485724925994873,0.5249914065612434,0.5062359970126961,79.05482482910156,0.6666832632129254,0.500018671346951,1.0,66.01345825195312,0.5020077201558696
 
 
 
WRAPresentations/binary_classification_evaluation_bertweet-mean_results.csv DELETED
@@ -1,2 +0,0 @@
1
- epoch,steps,cossim_accuracy,cossim_accuracy_threshold,cossim_f1,cossim_precision,cossim_recall,cossim_f1_threshold,cossim_ap,manhattan_accuracy,manhattan_accuracy_threshold,manhattan_f1,manhattan_precision,manhattan_recall,manhattan_f1_threshold,manhattan_ap,euclidean_accuracy,euclidean_accuracy_threshold,euclidean_f1,euclidean_precision,euclidean_recall,euclidean_f1_threshold,euclidean_ap,dot_accuracy,dot_accuracy_threshold,dot_f1,dot_precision,dot_recall,dot_f1_threshold,dot_ap
2
- -1,-1,0.606235997012696,0.7620757818222046,0.6671155668611901,0.5007862812640408,0.9988797610156833,0.2958064675331116,0.6323702200745711,0.6153286034353995,72.18702697753906,0.6850335895837661,0.5444897060442585,0.9233756534727409,87.33160400390625,0.6237689464821592,0.6188946975354742,3.8260865211486816,0.6802268387246109,0.5552144341583223,0.8778939507094847,4.539368629455566,0.6216050236505599,0.5073562359970127,15.957331657409668,0.6667997405060133,0.5006744604316546,0.9979088872292756,9.347114562988281,0.5036378198987528
 
 
 
WRAPresentations/binary_classification_evaluation_taco_plus-cls_results.csv DELETED
@@ -1,2 +0,0 @@
1
- epoch,steps,cossim_accuracy,cossim_accuracy_threshold,cossim_f1,cossim_precision,cossim_recall,cossim_f1_threshold,cossim_ap,manhattan_accuracy,manhattan_accuracy_threshold,manhattan_f1,manhattan_precision,manhattan_recall,manhattan_f1_threshold,manhattan_ap,euclidean_accuracy,euclidean_accuracy_threshold,euclidean_f1,euclidean_precision,euclidean_recall,euclidean_f1_threshold,euclidean_ap,dot_accuracy,dot_accuracy_threshold,dot_f1,dot_precision,dot_recall,dot_f1_threshold,dot_ap
2
- -1,-1,0.7291262135922331,0.48100075125694275,0.7469403868930123,0.6675291073738681,0.8477968633308439,0.26185911893844604,0.7784446756664181,0.72903286034354,189.03599548339844,0.7476147433966054,0.6775074328014077,0.8339058999253174,215.74383544921875,0.7756006874908153,0.7296863330843913,8.856854438781738,0.749117558073083,0.6811762548144525,0.8321135175504107,9.92266845703125,0.7757014416282921,0.7282300224047796,33.68499755859375,0.7459866859239338,0.6715485548614639,0.8389843166542196,19.206905364990234,0.7907547509665888
 
 
 
WRAPresentations/binary_classification_evaluation_taco_plus-mean_results.csv DELETED
@@ -1,2 +0,0 @@
1
- epoch,steps,cossim_accuracy,cossim_accuracy_threshold,cossim_f1,cossim_precision,cossim_recall,cossim_f1_threshold,cossim_ap,manhattan_accuracy,manhattan_accuracy_threshold,manhattan_f1,manhattan_precision,manhattan_recall,manhattan_f1_threshold,manhattan_ap,euclidean_accuracy,euclidean_accuracy_threshold,euclidean_f1,euclidean_precision,euclidean_recall,euclidean_f1_threshold,euclidean_ap,dot_accuracy,dot_accuracy_threshold,dot_f1,dot_precision,dot_recall,dot_f1_threshold,dot_ap
2
- -1,-1,0.7151045556385363,0.4821374714374542,0.7366632234847221,0.6650622706215029,0.8255414488424198,0.4047052264213562,0.7599870249929216,0.7044809559372666,151.0953826904297,0.7335388153376878,0.6408460529620225,0.8575802837938761,173.28468322753906,0.7527305216792975,0.7038088125466766,7.033679008483887,0.7271594207652066,0.6483024826739188,0.8278566094100075,7.890071868896484,0.7500219385034546,0.7194174757281553,25.014659881591797,0.7426876068238252,0.6535558204211362,0.8599701269604182,18.364337921142578,0.763325340473535
 
 
 
WRAPresentations/binary_classification_evaluation_wrapresentations-cls_results.csv DELETED
@@ -1,2 +0,0 @@
1
- epoch,steps,cossim_accuracy,cossim_accuracy_threshold,cossim_f1,cossim_precision,cossim_recall,cossim_f1_threshold,cossim_ap,manhattan_accuracy,manhattan_accuracy_threshold,manhattan_f1,manhattan_precision,manhattan_recall,manhattan_f1_threshold,manhattan_ap,euclidean_accuracy,euclidean_accuracy_threshold,euclidean_f1,euclidean_precision,euclidean_recall,euclidean_f1_threshold,euclidean_ap,dot_accuracy,dot_accuracy_threshold,dot_f1,dot_precision,dot_recall,dot_f1_threshold,dot_ap
2
- -1,-1,0.706572068707991,0.8067178726196289,0.737768413224677,0.6304612614520998,0.8890963405526512,0.6855535507202148,0.7590287089584331,0.7081404032860343,123.10991668701172,0.7419977352833289,0.647587762033351,0.8686333084391337,144.03335571289062,0.7576467853896455,0.704798356982823,5.606583595275879,0.732452934392411,0.6481902078601616,0.8418969380134429,6.797209739685059,0.7537064663424923,0.7098020911127707,64.41665649414062,0.7416985389428539,0.6489680472709961,0.8653472740851381,61.430320739746094,0.7435509234541764
 
 
 
WRAPresentations/binary_classification_evaluation_wrapresentations-mean_results.csv DELETED
@@ -1,2 +0,0 @@
1
- epoch,steps,cossim_accuracy,cossim_accuracy_threshold,cossim_f1,cossim_precision,cossim_recall,cossim_f1_threshold,cossim_ap,manhattan_accuracy,manhattan_accuracy_threshold,manhattan_f1,manhattan_precision,manhattan_recall,manhattan_f1_threshold,manhattan_ap,euclidean_accuracy,euclidean_accuracy_threshold,euclidean_f1,euclidean_precision,euclidean_recall,euclidean_f1_threshold,euclidean_ap,dot_accuracy,dot_accuracy_threshold,dot_f1,dot_precision,dot_recall,dot_f1_threshold,dot_ap
2
- -1,-1,0.7131814787154593,0.7389853596687317,0.7407480541705748,0.6621752816922126,0.8404779686333085,0.6548119783401489,0.764091778672824,0.7105675877520538,107.62962341308594,0.7439052101931505,0.6579385587137525,0.8557132188200149,159.9492950439453,0.7620238128848309,0.7092233009708738,7.132084846496582,0.7394660662592474,0.6494350282485876,0.8584764749813294,8.149255752563477,0.7595757373512098,0.7085324869305452,58.264991760253906,0.7354351731298515,0.6582293418296604,0.8331590739357729,52.61748123168945,0.7348401157602552
 
 
 
WRAPresentations/bpe.codes DELETED
The diff for this file is too large to render. See raw diff
 
WRAPresentations/config.json DELETED
@@ -1,29 +0,0 @@
1
- {
2
- "_name_or_path": "./models/WRAPresentations/",
3
- "architectures": [
4
- "RobertaModel"
5
- ],
6
- "attention_probs_dropout_prob": 0.1,
7
- "bos_token_id": 0,
8
- "classifier_dropout": null,
9
- "eos_token_id": 2,
10
- "gradient_checkpointing": false,
11
- "hidden_act": "gelu",
12
- "hidden_dropout_prob": 0.1,
13
- "hidden_size": 768,
14
- "initializer_range": 0.02,
15
- "intermediate_size": 3072,
16
- "layer_norm_eps": 1e-05,
17
- "max_position_embeddings": 130,
18
- "model_type": "roberta",
19
- "num_attention_heads": 12,
20
- "num_hidden_layers": 12,
21
- "pad_token_id": 1,
22
- "position_embedding_type": "absolute",
23
- "tokenizer_class": "BertweetTokenizer",
24
- "torch_dtype": "float32",
25
- "transformers_version": "4.32.1",
26
- "type_vocab_size": 1,
27
- "use_cache": true,
28
- "vocab_size": 64001
29
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
WRAPresentations/config_sentence_transformers.json DELETED
@@ -1,7 +0,0 @@
1
- {
2
- "__version__": {
3
- "sentence_transformers": "2.2.2",
4
- "transformers": "4.32.1",
5
- "pytorch": "2.0.1+cu118"
6
- }
7
- }
 
 
 
 
 
 
 
 
WRAPresentations/eval/binary_classification_evaluation_fine-tune-test_results.csv DELETED
@@ -1,31 +0,0 @@
1
- epoch,steps,cossim_accuracy,cossim_accuracy_threshold,cossim_f1,cossim_precision,cossim_recall,cossim_f1_threshold,cossim_ap,manhattan_accuracy,manhattan_accuracy_threshold,manhattan_f1,manhattan_precision,manhattan_recall,manhattan_f1_threshold,manhattan_ap,euclidean_accuracy,euclidean_accuracy_threshold,euclidean_f1,euclidean_precision,euclidean_recall,euclidean_f1_threshold,euclidean_ap,dot_accuracy,dot_accuracy_threshold,dot_f1,dot_precision,dot_recall,dot_f1_threshold,dot_ap
2
- 0,1000,0.740154848771793,0.585610032081604,0.7512134477787623,0.7152751542626873,0.7909541580794296,0.5636146068572998,0.7862586453830283,0.7320364284520693,162.85134887695312,0.7506087738358372,0.6803703619858285,0.8370188826379992,178.11680603027344,0.7825977157621367,0.7328440929092631,8.462957382202148,0.7480418089832814,0.6939864671685886,0.811229321004846,9.021198272705078,0.7786076253783956,0.7421461594162535,51.69240951538086,0.7458258031451461,0.7230648535564853,0.7700662841864869,48.888954162597656,0.7438784524600425
3
- 0,2000,0.7328301676600011,0.6296051740646362,0.739300023880914,0.7059389885476842,0.7759705898735587,0.5345308780670166,0.7887225712892582,0.728109508160196,163.58560180664062,0.7392852632908196,0.6762776083541262,0.815239792792291,187.81973266601562,0.7859110126313795,0.7315768952264246,7.947263717651367,0.7440591830170473,0.6914560367297161,0.8053250153177742,9.377279281616211,0.7863726621383399,0.729223528101153,56.419029235839844,0.7399514107714141,0.6511854360711261,0.8567370355929371,36.074989318847656,0.7545647266175324
4
- 0,3000,0.7333593271319556,0.5852109789848328,0.7366814937415951,0.6875693941588221,0.793349300952487,0.4697726368904114,0.7814915169482928,0.7342923188325071,167.56234741210938,0.7405934896616665,0.6981285598047193,0.7885590152063722,185.62295532226562,0.7844438220700205,0.7312009134963516,8.143726348876953,0.7367706050469709,0.6824149474134042,0.8005347295716594,9.543863296508789,0.7819653120118226,0.730685679273659,51.036773681640625,0.729976677184799,0.7193380921479559,0.7409346627304628,43.77399444580078,0.7682029379615493
5
- 0,4000,0.7319528769564975,0.6113885641098022,0.7304305217166424,0.7143199424966057,0.7472845763939174,0.5032544136047363,0.7796168548865824,0.7295438088341781,183.77114868164062,0.7423264834222205,0.6704241798217052,0.8315044839302623,200.60665893554688,0.7859687567213028,0.7313540912382331,7.762547492980957,0.7386498803843737,0.7100752884343603,0.7696206762101042,9.466588973999023,0.7804174238485684,0.7320364284520693,53.00003433227539,0.7226950455378582,0.7376939811457578,0.7082938784604245,49.99421691894531,0.7545644434872574
6
- 0,5000,0.7008717206037988,0.6893185377120972,0.7238735865468251,0.6202031041952344,0.869158357934607,0.505645751953125,0.7573581149708617,0.697181529549379,156.41075134277344,0.7252543940795559,0.5998032715217312,0.9170612153957556,199.6271209716797,0.7576633856761417,0.699715924915056,7.274563789367676,0.721679105785066,0.6214721087887992,0.860413301398095,9.24429702758789,0.7572170773271093,0.6952319946527042,63.3360710144043,0.7178262039278989,0.5849278310738526,0.9288698267698992,39.34496307373047,0.7060530230032297
7
- 0,-1,0.7016933103102545,0.5540434122085571,0.7219917012448133,0.6517000089710236,0.8092797861081713,0.49931254982948303,0.7624222182491369,0.7033086392246422,163.49075317382812,0.7302173477786924,0.607896881250463,0.9141647635492676,203.9537811279297,0.7638903054769688,0.6997855511613658,8.051408767700195,0.7222059254913465,0.6070019723865878,0.8913830557566981,9.920435905456543,0.7609098207552563,0.6990753634490058,51.86427688598633,0.722648392485766,0.6104018277113291,0.8854787500696263,38.65266418457031,0.7213601486505429
8
- 1,1000,0.6843563749791122,0.622389554977417,0.7175578929688649,0.5758137340791158,0.9518743385506601,0.4198741316795349,0.728622577138776,0.6873503035704339,183.5062713623047,0.7161725373565888,0.5691027555759793,0.965743886815574,214.1572265625,0.7309284856131453,0.6852058151840918,8.175722122192383,0.712980311293007,0.5711458804983985,0.9485322787277892,9.99151611328125,0.7289892547982773,0.6832284297888932,53.4877815246582,0.7173183420036919,0.5753221844611192,0.9523756475240907,35.72834777832031,0.6721446578082373
9
- 1,2000,0.7341948420876734,0.6994354724884033,0.7351262826582051,0.679920477137177,0.8000891215952766,0.46387240290641785,0.7749941661774506,0.7303653985406339,155.98574829101562,0.736097204252686,0.6174899488476566,0.9111012087116359,210.04220581054688,0.7764608189443374,0.7341530663398875,8.792206764221191,0.7397617882199381,0.6360320641282565,0.8839191221522865,9.918052673339844,0.7768380980149778,0.7378293321450454,54.189537048339844,0.729577001152137,0.7187719582725258,0.7407118587422715,45.27055358886719,0.7493830297182804
10
- 1,3000,0.7296552108282738,0.5739662647247314,0.7263482156068819,0.7024692321702704,0.7519077591488887,0.502313494682312,0.7743028481845646,0.7257422157856626,172.76417541503906,0.7305597908952298,0.6459385431824017,0.8406951484431572,192.86375427246094,0.7786140728743274,0.7280120314153623,9.018473625183105,0.7278951323369792,0.7273890310379353,0.7284019383946972,9.034345626831055,0.7743923653074898,0.7301147440539185,52.54202651977539,0.7242862080884893,0.7191850407754194,0.7294602573386063,45.99448013305664,0.7453817335824302
11
- 1,4000,0.7402523255166268,0.6452406048774719,0.7435705436277247,0.7273405247036889,0.7605414136913051,0.6225917339324951,0.7840243237284034,0.7407814849885813,165.77940368652344,0.7478685131339017,0.6705996729859914,0.8452626302010806,186.43260192871094,0.7861603683958884,0.7381913886258564,6.82960319519043,0.7469631569803958,0.7179961464354528,0.7783657327466161,8.420555114746094,0.7836666708347142,0.7375508271598061,56.555137634277344,0.7407787993510005,0.719871761181479,0.7629365565643625,54.43528747558594,0.7451984157246527
12
- 1,5000,0.7230546426781039,0.536429762840271,0.7236885335669697,0.7116471918582736,0.736144376984348,0.5119767189025879,0.7659816565081021,0.7185707124157522,174.91607666015625,0.7234270345079709,0.6809389209782475,0.7715702111067788,188.4012451171875,0.7674257316915875,0.7213557622681446,8.596382141113281,0.718048780487805,0.7186697913179333,0.7174288419762713,8.912203788757324,0.7641920973694406,0.7250598785718264,48.78872299194336,0.7264762346514116,0.6759536768408181,0.7851612543864536,41.008968353271484,0.7442765183688369
13
- 1,-1,0.7278031526764329,0.5447660684585571,0.7325829851518163,0.7163503766202657,0.7495683172728792,0.5393775701522827,0.7817619381380104,0.7300451178076087,173.03179931640625,0.7377627420898732,0.6970785737294646,0.7834902244750181,185.53854370117188,0.7820923898251559,0.7308527822648025,8.732171058654785,0.7319959272407056,0.7233764821059502,0.7408232607363672,8.958128929138184,0.7811526259842299,0.7298780148164652,55.8602180480957,0.7309920305204152,0.6303764254717933,0.8698267698991812,39.615943908691406,0.7576507133048633
14
- 2,1000,0.7313123154904473,0.6461310386657715,0.7261163882519925,0.7119306321254617,0.740878961733415,0.5153070688247681,0.7716201179420012,0.7292096028518911,155.5853271484375,0.7341913213649892,0.7169360617883589,0.7522976661282237,181.00506591796875,0.7742673399169445,0.7307692307692307,7.668848991394043,0.7267108526816574,0.7078393578538801,0.7466161644293433,9.155574798583984,0.7719036011298094,0.7290982008577953,62.3431396484375,0.7222334888122761,0.7018050920938543,0.7438868155739988,43.685516357421875,0.7622692399615848
15
- 2,2000,0.7227065114465548,0.5908321142196655,0.7150135263179577,0.7052417716375458,0.7250598785718264,0.4846171736717224,0.7719227147151037,0.7227204366958169,180.50222778320312,0.7273562748902277,0.6713794241010416,0.7935164039436305,198.63790893554688,0.773027577465341,0.7243914666072523,9.024002075195312,0.7205255336583198,0.7186192758400975,0.7224419317105776,9.362994194030762,0.7726321451566962,0.7201581908316159,52.466888427734375,0.7121150473518064,0.7175209929599367,0.7067899515401326,45.56163024902344,0.7381867489399621
16
- 2,3000,0.7205063220631649,0.5153660178184509,0.7233309094826426,0.7002294295546981,0.7480086893555394,0.47842174768447876,0.7726502799886763,0.7227343619450788,185.091064453125,0.7239453237771278,0.65895848090853,0.8031526764329081,198.99757385253906,0.7730456413359823,0.719768283852281,8.75425910949707,0.7234372908579841,0.6964230491701886,0.7526318721105107,9.477849006652832,0.7717109752260815,0.7238483818860357,46.414215087890625,0.7165663812940387,0.7359736942544259,0.6981562969977163,46.414215087890625,0.7366198661418574
17
- 2,4000,0.7096724781373587,0.6618224382400513,0.7143908279709428,0.6873825399685916,0.7436083105887595,0.5073532462120056,0.7553558726272488,0.7123322007463934,183.67529296875,0.7185965280878219,0.6798350228582787,0.7620453406115969,192.9994354248047,0.7607150504711715,0.7098256558792403,7.929637908935547,0.7171552087726697,0.6807286193264275,0.7577006628418649,9.43260383605957,0.7558975464780855,0.7107447223305298,57.237831115722656,0.7079392592034365,0.6447347349637391,0.7848827494012143,43.33351135253906,0.7251238179883864
18
- 2,5000,0.7255611875452571,0.5437341928482056,0.715160713317795,0.696437054631829,0.7349189550492954,0.47667479515075684,0.7567678482268225,0.7220102489834568,185.57192993164062,0.711844839528977,0.7386642707397424,0.6869046955940511,186.80551147460938,0.7609719241458824,0.723388848660391,8.853292465209961,0.715018031586365,0.7095181945320427,0.7206037988079986,9.505727767944336,0.7535932561250493,0.7251295048181362,48.92876434326172,0.7170105967651981,0.7179315351538504,0.7160920180471231,44.10511779785156,0.7190622136628879
19
- 2,-1,0.7221216509775525,0.5830212235450745,0.7139959432048681,0.7028981596416428,0.7254497855511614,0.49401402473449707,0.7577789632915699,0.725213056313708,183.59326171875,0.7167360251940165,0.7236954176367043,0.709909207374812,191.8802490234375,0.7617706717510708,0.7214671642622403,8.792418479919434,0.7097350745193001,0.7355401529636711,0.6856792736589985,9.025744438171387,0.7570492454895487,0.7195594051133515,55.14214324951172,0.7116433948009067,0.7150247413405308,0.7082938784604245,46.18491744995117,0.7221163569754776
20
- 3,1000,0.7138779034144711,0.667630672454834,0.7080084370960059,0.6922483301844115,0.7245028686013479,0.49316561222076416,0.7572260109745176,0.71571603631705,187.70257568359375,0.715323837233353,0.7083981974600574,0.7223862307135298,191.52281188964844,0.7602624755127021,0.7163844482816243,7.673393249511719,0.7175615788931259,0.6675775444652189,0.7756363838912717,9.71194076538086,0.7584603065381312,0.7116359382832953,57.26898193359375,0.7012660350465331,0.703724478348665,0.6988247089622904,46.01648712158203,0.7149009507877254
21
- 3,2000,0.7034061159694759,0.645792543888092,0.6999796596379416,0.6821004783424509,0.7188213669024676,0.5049311518669128,0.7487459523533433,0.7043112571715033,171.255126953125,0.7112607557052002,0.643979766958721,0.7942405169052527,200.30978393554688,0.7513133974056816,0.7039492006906923,8.17646598815918,0.7082396663071586,0.6638571532971183,0.7589817857739654,9.540943145751953,0.7502180801022875,0.705912660836629,57.01820373535156,0.6982961421830594,0.6822735365237956,0.7150894001002618,45.76020050048828,0.692171491602241
22
- 3,3000,0.7149362223583802,0.6349054574966431,0.6997366856366263,0.7115628239087873,0.6882972205202473,0.5243378281593323,0.74569492687835,0.7127638834735142,167.7770233154297,0.7039425533383967,0.6981292284094333,0.7098535063777641,189.87074279785156,0.7520055862079771,0.7148387456135464,7.772121429443359,0.702222978337303,0.7151602656656079,0.6897454464434913,9.073543548583984,0.7461157073864344,0.7128613602183479,55.63344192504883,0.6957652088508868,0.7226522652265227,0.6708071074472233,49.71984100341797,0.7184657851538883
23
- 3,4000,0.7158552888096696,0.5480663180351257,0.703342939481268,0.728667821102287,0.6797192669748788,0.5273720026016235,0.7453813351098914,0.7165376260235058,183.96432495117188,0.7057195970775151,0.7312091259294652,0.6819473068567927,189.3191680908203,0.7524368587933294,0.7164540745279341,8.950374603271484,0.7072083052524413,0.7288741761002572,0.6867932935999554,9.20567798614502,0.7461612088563488,0.7138918286637331,56.630958557128906,0.6992818928275112,0.7188823529411764,0.6807218849217401,44.9284553527832,0.7240756559330876
24
- 3,5000,0.7083356542082103,0.5160317420959473,0.7045579274728059,0.7120630315442159,0.6972093800479029,0.504380464553833,0.7456199629999113,0.7085863086949257,189.6381072998047,0.7156636611448942,0.6758750433189762,0.7604300116972094,200.63095092773438,0.7508556795035048,0.7090458419205704,8.91309928894043,0.710638182655078,0.6914120126448894,0.7309641842588982,9.557706832885742,0.7475660668550255,0.7067203252938228,47.12469482421875,0.6979566775662902,0.7161485041169748,0.6806661839246922,46.63441848754883,0.7068809977995912
25
- 3,-1,0.7080432239737091,0.49873578548431396,0.7075335626317543,0.7046798165644511,0.7104105163482426,0.4846397340297699,0.7455931627042371,0.7091711691639281,188.5823974609375,0.7179420282292781,0.6752552021986651,0.766390018381329,202.57958984375,0.7507781252042124,0.7068735030357043,9.246362686157227,0.7097935189751928,0.6806943982819011,0.7414916727009413,9.675313949584961,0.7474166262261993,0.7069570545312761,45.76421356201172,0.6987580299785867,0.7167877225866917,0.6816131008745057,45.31532669067383,0.7064645183331485
26
- 4,1000,0.6943825544477246,0.4930197298526764,0.6938075334025949,0.6126472094214029,0.7997549156129895,0.3941698670387268,0.7405568577999365,0.6975296607809279,195.56838989257812,0.7046135092879667,0.6720076829761423,0.740544755751128,202.88018798828125,0.7438774771563635,0.6977663900183814,9.705812454223633,0.7013844544379986,0.6847229222484548,0.7188770678995154,9.76524543762207,0.7417766817151012,0.6942293767058431,58.167266845703125,0.69301861413633,0.5675704488113429,0.8896563248482148,29.745956420898438,0.6942912327930107
27
- 4,2000,0.7030579847379268,0.5139841437339783,0.7035393976019448,0.6940357152535024,0.7133069681947307,0.4846697151660919,0.7441474052149798,0.7052024731242689,189.02398681640625,0.7089247774345337,0.6663073040452797,0.7573664568595778,202.80690002441406,0.7470481948205908,0.7012198518353479,9.050704002380371,0.7068246921625329,0.689534606521336,0.7250041775747786,9.516852378845215,0.7449539817473834,0.6984905029800034,54.68560028076172,0.6931623513288631,0.6181849927975904,0.7888375201916115,36.2747802734375,0.7026699316622068
28
- 4,3000,0.706622848548989,0.5015596151351929,0.7011823414329865,0.7140794640794641,0.68874282849663,0.4988675117492676,0.7445471790877778,0.7076393917451123,193.09793090820312,0.7143663450810527,0.6755877752675091,0.7578677658330084,201.8223876953125,0.7489774100379399,0.7079735977273993,9.55426025390625,0.7111704010236585,0.6955479816806902,0.7275107224419317,9.62564754486084,0.7461480480834116,0.7028212555004735,45.977439880371094,0.6971972287773052,0.6374229872856912,0.7693421712248649,38.79158020019531,0.7053584501526011
29
- 4,4000,0.6990475129504818,0.5282589197158813,0.694938652285541,0.6473571617431435,0.7500696262463098,0.4708825945854187,0.7377003766587753,0.7024313485211385,190.80166625976562,0.7063629222309505,0.6665019518703366,0.7512950481813625,200.47503662109375,0.7419852358176189,0.6996602239180081,9.462997436523438,0.70463525628798,0.689767392232181,0.7201581908316159,9.520181655883789,0.7391906915771685,0.6963738650921851,47.430747985839844,0.6992821194083234,0.6458706937234545,0.7623238455968362,40.27260971069336,0.7056118166622416
30
- 4,5000,0.7005653651200356,0.5047475695610046,0.6955575138665699,0.6504435503417519,0.7473959783880132,0.46212443709373474,0.7389561149537538,0.7068317272879184,192.73989868164062,0.7078305018551064,0.6645481434403188,0.7571436528713864,202.312255859375,0.7436432743841523,0.7028212555004735,8.905183792114258,0.7042218982875359,0.6892818092114036,0.7198239848493289,9.600275039672852,0.7404873009797531,0.6990753634490058,46.9439697265625,0.699538423955668,0.6497098230278713,0.757644961844817,39.5091667175293,0.706242117687969
31
- 4,-1,0.7005375146215117,0.504567563533783,0.6955755195167166,0.6506439017291975,0.7471731743998218,0.4619932770729065,0.7389557923053753,0.7069292040327522,192.78004455566406,0.7079010063400727,0.6646295255090816,0.7571993538684343,202.333984375,0.7436463307181385,0.7027655545034256,8.898887634277344,0.7044252076630003,0.6817814609230448,0.7286247423828887,9.639416694641113,0.7404959118025523,0.6990614381997438,46.920677185058594,0.6993571612239651,0.6495199885370397,0.7574778588536735,39.490272521972656,0.7062346801358627
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
WRAPresentations/modules.json DELETED
@@ -1,14 +0,0 @@
1
- [
2
- {
3
- "idx": 0,
4
- "name": "0",
5
- "path": "",
6
- "type": "sentence_transformers.models.Transformer"
7
- },
8
- {
9
- "idx": 1,
10
- "name": "1",
11
- "path": "1_Pooling",
12
- "type": "sentence_transformers.models.Pooling"
13
- }
14
- ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
WRAPresentations/pytorch_model.bin DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:fb34d868f70c3c727c58e27b9f30c5db25f89cbbfb2ab5734e550f3fe4645895
3
- size 539666601
 
 
 
 
WRAPresentations/sentence_bert_config.json DELETED
@@ -1,4 +0,0 @@
1
- {
2
- "max_seq_length": 128,
3
- "do_lower_case": false
4
- }
 
 
 
 
 
WRAPresentations/special_tokens_map.json DELETED
@@ -1,9 +0,0 @@
1
- {
2
- "bos_token": "<s>",
3
- "cls_token": "<s>",
4
- "eos_token": "</s>",
5
- "mask_token": "<mask>",
6
- "pad_token": "<pad>",
7
- "sep_token": "</s>",
8
- "unk_token": "<unk>"
9
- }
 
 
 
 
 
 
 
 
 
 
WRAPresentations/tokenizer_config.json DELETED
@@ -1,13 +0,0 @@
1
- {
2
- "bos_token": "<s>",
3
- "clean_up_tokenization_spaces": true,
4
- "cls_token": "<s>",
5
- "eos_token": "</s>",
6
- "mask_token": "<mask>",
7
- "model_max_length": 128,
8
- "normalization": false,
9
- "pad_token": "<pad>",
10
- "sep_token": "</s>",
11
- "tokenizer_class": "BertweetTokenizer",
12
- "unk_token": "<unk>"
13
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
WRAPresentations/vocab.txt DELETED
The diff for this file is too large to render. See raw diff