license: cc-by-nc-4.0
language:
- en
metrics:
- f1
pipeline_tag: text-classification
widget:
- text: 'Men shouldn’t be making laws about women’s bodies #abortion #Texas'
example_title: Statement
- text: >-
’Bitter truth’: EU chief pours cold water on idea of Brits keeping EU
citizenship after #Brexit HTTPURL via @USER
example_title: Notification
- text: >-
Opinion: As the draconian (and then some) abortion law takes effect in
#Texas, this is not an idle question for millions of Americans. A slippery
slope towards more like-minded Republican state legislatures to try to
follow suit. #abortion #F24 HTTPURL
example_title: Reason
- text: '@USER Blah blah blah blah blah blah'
example_title: None
- text: republican men and karens make me sick
example_title: Unlabeled 1
- text: No empire lives forever! Historical fact! GodWins! 🙏💪🇺🇲
example_title: Unlabeled 2
- text: >-
Further author information regarding registration and visa support letters
will be sent to the authors soon. #CIKM2023
example_title: Unlabeled 3
- text: Ummmmmm
example_title: Unlabeled 4
- text: >-
whoever says that The Last Jedi is a good movie is lying or trolling
everyone
example_title: Unlabeled 5
TACO -- Twitter Arguments from COnversations
Introducing TACO, a baseline classification model built upon AutoModelForSequenceClassification
, designed to identify tweets belonging to four distinct
classes: Reason, Statement, Notification, and None. Tailored specifically for argument mining on Twitter, this baseline model is an evolution of the
BERTweet-base architecture, which was originally pre-trained on Twitter data.
Through fine-tuning with the TACO dataset, the baseline model acquires its name and excels in the
extraction of Twitter Arguments from COnversations.
Class Semantics
The TACO framework revolves around the two key elements of an argument, as defined by the Cambridge Dictionary. It encodes inference as a guess that you make or an opinion that you form based on the information that you have, and it also leverages the definition of information as facts or details about a person, company, product, etc..
Taken together, the following classes of tweets can be identified by TACO:
- Statement, which refers to unique cases where only the inference is presented as something that someone says or writes officially, or an action done to express an opinion (see ex. 1).
- Reason, which represents a full argument where the inference is based on direct information mentioned in the tweet, such as a source-reference or quotation, and thus reveals the author’s motivation to try to understand and to make judgments based on practical facts (see ex. 3).
- Notification, which refers to a tweet that limits itself to providing information, such as media channels promoting their latest articles (see ex. 2).
- None, a tweet that provides neither inference nor information (see ex. 4).
In its entirety, TACO can classify the following hierarchy for tweets:
Usage
Using this model becomes easy when you have transformers
installed:
pip install - U transformers
Then you can use the model to generate tweet classifications like this:
from transformers import pipeline
pipe = pipeline("text-classification", model="TomatenMarc/TACO")
prediction = pipe("Huggingface is awesome")
print(prediction)
Notice: The tweets need to undergo preprocessing before classification.
Training
The final model underwent training using the entire shuffled ground truth dataset known as TACO, encompassing a total of 1734 tweets. This dataset showcases the distribution of topics as: #abortion (25.9%), #brexit (29.0%), #got (11.0%), #lotrrop (12.1%), #squidgame (12.7%), and #twittertakeover (9.3%). For training, we utilized SimpleTransformers.
Additionally, the category and class distribution of the dataset TACO is as follows:
Argument | No-Argument |
---|---|
865 (49.88%) | 869 (50.12%) |
Reason | Statement | Notification | None |
---|---|---|---|
581 (33.50%) | 284 (16.38%) | 500 (28.84%) | 369 (21.28%) |
Notice: Our training involved TACO to forecast class predictions, where the categories (Argument/No-Argument) represent class aggregations based on the inference component.
Dataloader
"data_loader": {
"type": "torch.utils.data.dataloader.DataLoader",
"args": {
"batch_size": 8,
"sampler": "torch.utils.data.sampler.RandomSampler"
}
}
Parameters of the fit()-Method:
{
"epochs": 5,
"max_grad_norm": 1,
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
"optimizer_params": {
"lr": 4e-05
},
"scheduler": "WarmupLinear",
"warmup_steps": 66,
"weight_decay": 0.06
}
Evaluation
We utilized a stratified 10-fold cross-validation approach to present TACO's performance. In doing so, we employed the identical data and parameters as outlined in the Training section. This involved training on k-1 splits and utilizing the kth split for making predictions.
In total, the TACO classifier performs as follows:
Classification
Precision | Recall | F1-Score | Support | |
---|---|---|---|---|
Reason | 73.69% | 75.22% | 74.45% | 581 |
Statement | 54.37% | 59.15% | 56.66% | 284 |
Notification | 79.02% | 77.60% | 78.30% | 500 |
None | 83.87% | 77.51% | 80.56% | 369 |
------------- | ----------- | --------- | ---------- | --------- |
Accuracy | 73.76% | 1734 | ||
Macro Avg | 72.74% | 72.37% | 72.49% | 1734 |
Weighted Avg | 74.23% | 73.76% | 73.95% | 1734 |
Categorization
Precision | Recall | F1-Score | Support | |
---|---|---|---|---|
No-Argument | 86.66% | 82.97% | 84.77% | 869 |
Argument | 83.59% | 87.17% | 85.34% | 865 |
------------- | ----------- | --------- | ---------- | --------- |
Accuracy | 85.06% | 1734 | ||
Macro Avg | 85.13% | 85.07% | 85.06% | 1734 |
Weighted Avg | 85.13% | 85.06% | 85.06% | 1734 |
Environmental Impact
- Hardware Type: A100 PCIe 40GB
- Hours used: 10 min
- Cloud Provider: Google Cloud Platform
- Compute Region: asia-southeast1 (Singapore)
- Carbon Emitted: 0.02kg CO2
Licensing
TACO © 2023 by Marc Feger is licensed under CC BY-NC-SA 4.0
Contact
If you have any questions, please feel free to reach out to marc.feger@uni-duesseldorf.de.