File size: 7,897 Bytes
19e577b f588e9f 19e577b f588e9f a0c027d f588e9f a0c027d f588e9f 218deed f588e9f 218deed f588e9f a0c027d f588e9f a0c027d f588e9f a0c027d f588e9f a0c027d f588e9f 2b89e21 f588e9f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 |
---
license: cc-by-nc-4.0
language:
- en
metrics:
- f1
pipeline_tag: text-classification
tags:
- transformers
- argument-mining
- opinion-mining
- information-extraction
- inference-extraction
- Twitter
widget:
- text: "Men shouldn’t be making laws about women’s bodies #abortion #Texas"
example_title: "Statement"
- text: "’Bitter truth’: EU chief pours cold water on idea of Brits keeping EU citizenship after #Brexit HTTPURL via @USER"
example_title: "Notification"
- text: "Opinion: As the draconian (and then some) abortion law takes effect in #Texas, this is not an idle question for millions of Americans. A slippery slope towards more like-minded Republican state legislatures to try to follow suit. #abortion #F24 HTTPURL"
example_title: "Reason"
- text: "@USER Blah blah blah blah blah blah"
example_title: "None"
- text: "republican men and karens make me sick"
example_title: "Unlabeled 1"
- text: "No empire lives forever! Historical fact! GodWins! 🙏💪🇺🇲"
example_title: "Unlabeled 2"
- text: "Further author information regarding registration and visa support letters will be sent to the authors soon. #CIKM2023"
example_title: "Unlabeled 3"
- text: "Ummmmmm"
example_title: "Unlabeled 4"
- text: "whoever says that The Last Jedi is a good movie is lying or trolling everyone"
example_title: "Unlabeled 5"
- text: "I don’t think people realize how big this story is GBI Strategies, the group paid $11M+ by Biden PACs to harvest fraudulent voter registrations in *20 states*, may be the root source of Democrat election rigging @USER may have just exposed their entire fraud machine HTTPURL"
example_tite: "Unlabeled 6"
---
# WRAP -- A TACO-based Classifier For Inference and Information-Driven Argument Mining on Twitter
Introducing WRAP, an advanced classification model built upon `AutoModelForSequenceClassification`, designed to identify tweets belonging to four
distinct classes: Reason, Statement, Notification, and None of the [TACO dataset](https://anonymous.4open.science/r/TACO).
Designed specifically for extracting information and inferences from Twitter data, this specialized classification model utilizes
[WRAPresentations](https://huggingface.co/TomatenMarc/WRAPresentations), from which WRAP acquires its name.
WRAPresentations is an advancement of the [BERTweet-base](https://huggingface.co/vinai/bertweet-base) architecture, whose embeddings were
extended on augmented tweets using contrastive learning for better encoding inference and information in tweets.
## Class Semantics
The TACO framework revolves around the two key elements of an argument, as defined by the [Cambridge Dictionary](https://dictionary.cambridge.org).
It encodes *inference* as *a guess that you make or an opinion that you form based on the information that you have*, and it also leverages the
definition of *information* as *facts or details about a person, company, product, etc.*.
Taken together, WRAP can identify specific classes of tweets, where inferences and information can be aggregated in relation to these distinct
classes containing these components:
* *Statement*, which refers to unique cases where only the *inference* is presented as *something that someone says or writes officially, or an action
done to express an opinion*.
* *Reason*, which represents a full argument where the *inference* is based on direct *information* mentioned in the tweet, such as a source-reference
or quotation, and thus reveals the author’s motivation *to try to understand and to make judgments based on practical facts*.
* *Notification*, which refers to a tweet that limits itself to providing *information*, such as media channels promoting their latest articles.
* *None*, a tweet that provides neither *inference* nor *information*.
In its entirety, WRAP can classify the following hierarchy for tweets:
<div align="center">
<img src="https://github.com/TomatenMarc/public-images/raw/main/Argument_Tree.svg" alt="Component Space" width="100%">
</div>
## Usage
Using this model becomes easy when you have `transformers` installed:
```python
pip install - U transformers
```
Then you can use the model to generate tweet classifications like this:
```python
from transformers import pipeline
pipe = pipeline("text-classification", model="TomatenMarc/WRAP")
prediction = pipe("Huggingface is awesome")
print(prediction)
```
<a href="https://anonymous.4open.science/r/TACO/notebooks/classifier_cv.ipynb">
<blockquote style="border-left: 5px solid grey; background-color: #f0f5ff; padding: 10px;">
Notice: The tweets need to undergo preprocessing before classification.
</blockquote>
</a>
## Training
The final model underwent training using the entire shuffled ground truth dataset known as TACO, encompassing a total of 1734 tweets.
This dataset showcases the distribution of topics as: #abortion (25.9%), #brexit (29.0%), #got (11.0%), #lotrrop (12.1%), #squidgame (12.7%), and
#twittertakeover (9.3%). For training, we utilized [SimpleTransformers](https://simpletransformers.ai).
Additionally, the category and class distribution of the dataset TACO is as follows:
| Inference | No-Inference |
|--------------|--------------|
| 865 (49.88%) | 869 (50.12%) |
| Information | No-Information |
|---------------|----------------|
| 1081 (62.34%) | 653 (37.66%) |
| Reason | Statement | Notification | None |
|--------------|--------------|--------------|--------------|
| 581 (33.50%) | 284 (16.38%) | 500 (28.84%) | 369 (21.28%) |
<p>
<blockquote style="border-left: 5px solid grey; background-color: #f0f5ff; padding: 10px;">
Notice: Our training involved WRAP to forecast class predictions, where the categories (information/inference) represent class aggregations
based on the inference or information component.
</blockquote>
<p>
### Dataloader
```
"data_loader": {
"type": "torch.utils.data.dataloader.DataLoader",
"args": {
"batch_size": 8,
"sampler": "torch.utils.data.sampler.RandomSampler"
}
}
```
Parameters of the fit()-Method:
```
{
"epochs": 5,
"max_grad_norm": 1,
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
"optimizer_params": {
"lr": 4e-05
},
"scheduler": "WarmupLinear",
"warmup_steps": 66,
"weight_decay": 0.06
}
```
## Evaluation
We applied a 6-fold (In-Topic) cross-validation method to demonstrate WRAP's optimal performance. This involved using the same dataset and parameters
described in the *Training* section, where we trained on k-1 splits and made predictions using the kth split.
Additionally, we assessed its ability to generalize across the 6 topics (Cross-Topic) of TACO. Each of the k topics was utilized for testing, while
the remaining k-1 topics were used for training purposes.
In total, the WRAP classifier performs as follows:
### Content Management
| Macro-F1 | Inference | Information | Multiclass |
|-------------|-----------|-------------|------------|
| In-Topic | 87.71% | 85.34% | 75.80% |
| Cross-Topic | 86.71% | 84.58% | 73.92% |
### Classification
| Micro-F1 | Reason | Statement | Notification | None |
|-------------|--------|-----------|--------------|--------|
| In-Topic | 77.82% | 61.10% | 80.56% | 83.71% |
| Cross-Topic | 76.52% | 58.99% | 78.43% | 81.73% |
# Environmental Impact
- **Hardware Type:** A100 PCIe 40GB
- **Hours used:** 10 min
- **Cloud Provider:** [Google Cloud Platform](https://colab.research.google.com)
- **Compute Region:** [asia-southeast1](https://cloud.google.com/compute/docs/gpus/gpu-regions-zones?hl=en) (Singapore)
- **Carbon Emitted:** 0.02kg CO2
## Licensing
[WRAP](https://huggingface.co/TomatenMarc/WRAP) © 2023 is licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/?ref=chooser-v1) |