TomatenMarc
commited on
Commit
•
d22fb13
1
Parent(s):
aa7cac0
Upload 19 files
Browse files
README.md
CHANGED
@@ -166,14 +166,14 @@ Additionally, 294 golden tweets (24.12%) related to the topic of #abortion were
|
|
166 |
Before fine-tuning, we built a copy of the dataset by creating an augmentation of each tweet. The augmentation consisted of replacing all the
|
167 |
topic words and entities in a tweet replaced, and then randomly masking 10% of the words in a tweet, which were then matched using
|
168 |
[BERTweet-base](https://huggingface.co/vinai/bertweet-base) as a `fill-mask` model. We chose to omit 10% of the words because this resulted in the
|
169 |
-
smallest possible average cosine distance between the tweets and their augmentations of 0.
|
170 |
-
|
171 |
During fine-tuning, we formed pairs by matching each tweet with all remaining tweets in the same data split (training, testing, holdout)
|
172 |
with similar or dissimilar class labels. For the training and testing set during the fine-tuning process, we utilized the augmentations, and for the
|
173 |
holdout tweets, we used their original text to test the fine-tuning process and the usefulness of the augmentations towards real tweets.
|
174 |
For all pairs, we chose the largest possible set so that both similar and dissimilar pairs are equally represented while covering all tweets
|
175 |
of the respective data split.
|
176 |
-
This process created
|
177 |
holdout data. Moreover, we utilized `MEAN` pooling, enhancing sentence representations, for fine-tuning.
|
178 |
|
179 |
The model was trained with the parameters:
|
@@ -215,23 +215,20 @@ Parameters of the fit()-Method:
|
|
215 |
|
216 |
## Evaluation Results
|
217 |
|
218 |
-
|
219 |
-
|
220 |
|
221 |
|
222 |
| Model | Precision | Recall | F1 | Support |
|
223 |
|-----------------------------------------|-----------|---------|--------|---------|
|
224 |
-
| Vanilla BERTweet-`CLS` | 50.00% | 100.00% | 66.67% |
|
225 |
-
| Augmented BERTweet-`CLS` |
|
226 |
-
| WRAPresentations-`CLS` | 66.00% | 84.32% | 74.04% |
|
227 |
-
| WRAPresentations-`MEAN` (current model) | 63.05% | 88.91% | 73.78% |
|
228 |
-
|
229 |
-
|
230 |
-
|
231 |
-
|
232 |
-
Reason, Statement, Notification, and None. As reference, we report the results for Vanilla BERTweet-`CLS`, which a plain BERTweet-base model, for
|
233 |
-
Augmented BERTweet-`CLS`, which was trained on the same augmentations as WRAPresentations-`MEAN` but directly optimizing on the classification task, and
|
234 |
-
WRAPresentations-`MEAN`, which is the same model as the presented model but with `CLS` pooling during fine-tuning.
|
235 |
|
236 |
## Full Model Architecture
|
237 |
<div align="center">
|
|
|
166 |
Before fine-tuning, we built a copy of the dataset by creating an augmentation of each tweet. The augmentation consisted of replacing all the
|
167 |
topic words and entities in a tweet replaced, and then randomly masking 10% of the words in a tweet, which were then matched using
|
168 |
[BERTweet-base](https://huggingface.co/vinai/bertweet-base) as a `fill-mask` model. We chose to omit 10% of the words because this resulted in the
|
169 |
+
smallest possible average cosine distance between the tweets and their augmentations of ~0.08 making augmentation during pre-classification
|
170 |
+
fine-tuning itself a regulating factor prior to any overfitting with the later test data.
|
171 |
During fine-tuning, we formed pairs by matching each tweet with all remaining tweets in the same data split (training, testing, holdout)
|
172 |
with similar or dissimilar class labels. For the training and testing set during the fine-tuning process, we utilized the augmentations, and for the
|
173 |
holdout tweets, we used their original text to test the fine-tuning process and the usefulness of the augmentations towards real tweets.
|
174 |
For all pairs, we chose the largest possible set so that both similar and dissimilar pairs are equally represented while covering all tweets
|
175 |
of the respective data split.
|
176 |
+
This process created 162,064 pairs for training and 71,812 pairs for testing. An additional 53,560 pairs were used for final evaluation with the
|
177 |
holdout data. Moreover, we utilized `MEAN` pooling, enhancing sentence representations, for fine-tuning.
|
178 |
|
179 |
The model was trained with the parameters:
|
|
|
215 |
|
216 |
## Evaluation Results
|
217 |
|
218 |
+
We optimized several BERTweet models with `CLS` or `MEAN` pooling and evaluated them using the `BinaryClassificationEvaluator` of SBERT with
|
219 |
+
standard `CLS` tokens for classification showing:
|
220 |
|
221 |
|
222 |
| Model | Precision | Recall | F1 | Support |
|
223 |
|-----------------------------------------|-----------|---------|--------|---------|
|
224 |
+
| Vanilla BERTweet-`CLS` | 50.00% | 100.00% | 66.67% | 53,560 |
|
225 |
+
| Augmented BERTweet-`CLS` | 65.69% | 86.66% | 74.73% | 53,560 |
|
226 |
+
| WRAPresentations-`CLS` | 66.00% | 84.32% | 74.04% | 53,560 |
|
227 |
+
| WRAPresentations-`MEAN` (current model) | 63.05% | 88.91% | 73.78% | 53,560 |
|
228 |
+
|
229 |
+
The outcomes for WRAPresentations-`MEAN` are influenced by the utilization of `CLS` pooling during testing, while `MEAN` pooling was employed during
|
230 |
+
fine-tuning. Despite this, employing `MEAN` pooling during the fine-tuning process still improved the `CLS` representation, particularly in terms
|
231 |
+
of recall. When WRAPresentations-`MEAN` is tested with `MEAN` pooling, the resulting F1 score stands at 74.07%.
|
|
|
|
|
|
|
232 |
|
233 |
## Full Model Architecture
|
234 |
<div align="center">
|