File size: 8,734 Bytes
d28be65
 
 
 
 
 
 
7f5c5da
 
 
 
 
 
 
d28be65
2f20f64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d28be65
 
 
 
 
 
 
e3688f8
d28be65
 
 
 
 
 
 
 
 
 
 
84a430d
d28be65
84a430d
 
 
d28be65
 
 
 
2037a48
 
 
d28be65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e3688f8
d28be65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d108d5d
d28be65
2f20f64
 
 
d108d5d
2f20f64
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
---
license: cc-by-nc-4.0
language:
- en
metrics:
- f1
pipeline_tag: text-classification
tags:
  - transformers
  - argument-mining
  - opinion-mining
  - information-extraction
  - inference-extraction
  - Twitter
widget:
- text: 'Men shouldn’t be making laws about women’s bodies #abortion #Texas'
  example_title: Statement
- text: >-
    ’Bitter truth’: EU chief pours cold water on idea of Brits keeping EU
    citizenship after #Brexit HTTPURL via @USER
  example_title: Notification
- text: >-
    Opinion: As the draconian (and then some) abortion law takes effect in
    #Texas, this is not an idle question for millions of Americans. A slippery
    slope towards more like-minded Republican state legislatures to try to
    follow suit. #abortion #F24 HTTPURL
  example_title: Reason
- text: '@USER Blah blah blah blah blah blah'
  example_title: None
- text: republican men and karens make me sick
  example_title: Unlabeled 1
- text: No empire lives forever! Historical fact! GodWins! 🙏💪🇺🇲
  example_title: Unlabeled 2
- text: >-
    Further author information regarding registration and visa support letters
    will be sent to the authors soon. #CIKM2023
  example_title: Unlabeled 3
- text: Ummmmmm
  example_title: Unlabeled 4
- text: >-
    whoever says that The Last Jedi is a good movie is lying or trolling
    everyone
  example_title: Unlabeled 5
- text: >-
    I don’t think people realize how big this story is GBI Strategies, the group
    paid $11M+ by Biden PACs to harvest fraudulent voter registrations in *20
    states*, may be the root source of Democrat election rigging @USER may have
    just exposed their entire fraud machine HTTPURL
  example_tite: Unlabeled 6
base_model:
- vinai/bertweet-base
---

# TACO -- Twitter Arguments from COnversations

Introducing TACO, a baseline classification model built upon `AutoModelForSequenceClassification`, designed to identify tweets belonging to four distinct
classes: Reason, Statement, Notification, and None. Tailored specifically for argument mining on Twitter, this baseline model is an evolution of the 
[BERTweet-base](https://huggingface.co/vinai/bertweet-base) architecture, which was originally pre-trained on Twitter data. 
Through fine-tuning with the [TACO dataset](https://github.com/TomatenMarc/TACO), the baseline model acquires its name and excels in the 
extraction of *Twitter Arguments from COnversations*.

## Class Semantics

The TACO framework revolves around the two key elements of an argument, as defined by the [Cambridge Dictionary](https://dictionary.cambridge.org).
It encodes *inference* as *a guess that you make or an opinion that you form based on the information that you have*, and it also leverages the
definition of *information* as *facts or details about a person, company, product, etc.*.

Taken together, the following classes of tweets can be identified by TACO:

* *Statement*, which refers to unique cases where only the *inference* is presented as *something that someone says or writes officially, or an action
done to express an opinion*.
* *Reason*, which represents a full argument where the *inference* is based on direct *information* mentioned in the tweet, such as a source-reference
or quotation, and thus reveals the author’s motivation *to try to understand and to make judgments based on practical facts*.
* *Notification*, which refers to a tweet that limits itself to providing *information*, such as media channels promoting their latest articles.
* *None*, a tweet that provides neither *inference* nor *information*.


In its entirety, TACO can classify the following hierarchy for tweets:

<div align="center">
  <img src="https://github.com/TomatenMarc/public-images/raw/main/Argument_Tree.svg" alt="Argument Tree" width="100%">
</div>

## Usage

Using this model becomes easy when you have `transformers` installed:

```python
pip install - U transformers
```

Then you can use the model to generate tweet classifications like this:

```python
from transformers import pipeline

pipe = pipeline("text-classification", model="TomatenMarc/TACO")
prediction = pipe("Huggingface is awesome")

print(prediction)
```

<a href="https://github.com/TomatenMarc/TACO/blob/main/notebooks/classifier_cv.ipynb">
    <blockquote style="border-left: 5px solid grey; background-color: #f0f5ff; padding: 10px;">
        Notice: The tweets need to undergo preprocessing before classification.
    </blockquote>
</a>

## Training

The final model underwent training using the entire shuffled ground truth dataset known as TACO, encompassing a total of 1734 tweets.
This dataset showcases the distribution of topics as: #abortion (25.9%), #brexit (29.0%), #got (11.0%), #lotrrop (12.1%), #squidgame (12.7%), and
#twittertakeover (9.3%). For training, we utilized [SimpleTransformers](https://simpletransformers.ai).

Additionally, the category and class distribution of the dataset TACO is as follows:

| Argument     | No-Argument      |
|--------------|------------------|
| 865 (49.88%) | 869 (50.12%)     |

| Reason       | Statement    | Notification | None         |
|--------------|--------------|--------------|--------------|
| 581 (33.50%) | 284 (16.38%) | 500 (28.84%) | 369 (21.28%) |

<p>
    <blockquote style="border-left: 5px solid grey; background-color: #f0f5ff; padding: 10px;">
        Notice: Our training involved TACO to forecast class predictions, where the categories (Argument/No-Argument) represent class aggregations 
based on the inference component.
    </blockquote>
<p>

### Dataloader

```
"data_loader": {
    "type": "torch.utils.data.dataloader.DataLoader",
    "args": {
        "batch_size": 8,
        "sampler": "torch.utils.data.sampler.RandomSampler"
    }
}
```

Parameters of the fit()-Method:

```
{
    "epochs": 5,
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 4e-05
    },
    "scheduler": "WarmupLinear",
    "warmup_steps": 66,
    "weight_decay": 0.06
}
```

## Evaluation

We utilized a stratified 10-fold cross-validation approach to present TACO's performance. In doing so, we employed the identical data and parameters
as outlined in the *Training* section. This involved training on k-1 splits and utilizing the kth split for making predictions.

In total, the TACO classifier performs as follows:

### Classification

|             | Precision | Recall  | F1-Score | Support |
|-------------|-----------|---------|----------|---------|
| Reason      | 73.69%    | 75.22%  | 74.45%   | 581     |
| Statement   | 54.37%    | 59.15%  | 56.66%   | 284     |
| Notification| 79.02%    | 77.60%  | 78.30%   | 500     |
| None        | 83.87%    | 77.51%  | 80.56%   | 369     |
|-------------|-----------|---------|----------|---------|
| Accuracy    |           |         | 73.76%   | 1734    |
| Macro Avg   | 72.74%    | 72.37%  | 72.49%   | 1734    |
| Weighted Avg| 74.23%    | 73.76%  | 73.95%   | 1734    |

### Categorization

|             | Precision | Recall  | F1-Score | Support |
|-------------|-----------|---------|----------|---------|
| No-Argument | 86.66%    | 82.97%  | 84.77%   | 869     |
| Argument    | 83.59%    | 87.17%  | 85.34%   | 865     |
|-------------|-----------|---------|----------|---------|
| Accuracy    |           |         | 85.06%   | 1734    |
| Macro Avg   | 85.13%    | 85.07%  | 85.06%   | 1734    |
| Weighted Avg| 85.13%    | 85.06%  | 85.06%   | 1734    |

# Environmental Impact

- **Hardware Type:** A100 PCIe 40GB
- **Hours used:** 10 min
- **Cloud Provider:** [Google Cloud Platform](https://colab.research.google.com)
- **Compute Region:** [asia-southeast1](https://cloud.google.com/compute/docs/gpus/gpu-regions-zones?hl=en) (Singapore)
- **Carbon Emitted:** 0.02kg CO2

# Licensing

[TACO](https://huggingface.co/TomatenMarc/TACO) © 2023 is licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/?ref=chooser-v1)


# Citation

```
@inproceedings{feger-dietze-2024-taco,
    title = "{TACO} {--} {T}witter Arguments from {CO}nversations",
    author = "Feger, Marc  and
              Dietze, Stefan",
    editor = "Calzolari, Nicoletta  and
              Kan, Min-Yen  and
              Hoste, Veronique  and
              Lenci, Alessandro  and
              Sakti, Sakriani  and
              Xue, Nianwen",
              booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.1349",
    pages = "15522--15529"
}
```