language: english
widget:
- text: >-
It has been determined that the amount of greenhouse gases have decreased
by almost half because of the prevalence in the utilization of nuclear
power.
Welcome to RoBERTArg!
π€ Model description
This model was trained on ~25k heterogeneous manually annotated sentences (π Stab et al. 2018) of controversial topics to classify text into one of two labels: π· NON-ARGUMENT (0) and ARGUMENT (1).
π Dataset
The dataset (π Stab et al. 2018) consists of ARGUMENTS (~11k) that either support or oppose a topic if it includes a relevant reason for supporting or opposing the topic, or as a NON-ARGUMENT (~14k) if it does not include reasons. The authors focus on controversial topics, i.e., topics that include "an obvious polarity to the possible outcomes" and compile a final set of eight controversial topics: abortion, school uniforms, death penalty, marijuana legalization, nuclear energy, cloning, gun control, and minimum wage.
TOPIC | ARGUMENT | NON-ARGUMENT |
---|---|---|
abortion | 2213 | 2,427 |
school uniforms | 325 | 1,734 |
death penalty | 325 | 2,083 |
marijuana legalization | 325 | 1,262 |
nuclear energy | 325 | 2,118 |
cloning | 325 | 1,494 |
gun control | 325 | 1,889 |
minimum wage | 325 | 1,346 |
ππΌββοΈModel training
RoBERTArg was fine-tuned on a RoBERTA (base) pre-trained model from HuggingFace using the HuggingFace trainer with the following hyperparameters:
training_args = TrainingArguments(
num_train_epochs=2,
learning_rate=2.3102e-06,
seed=8,
per_device_train_batch_size=64,
per_device_eval_batch_size=64,
)
π Evaluation
The model was evaluated on an evaluation set (20%):
Model | Acc | F1 | R arg | R non | P arg | P non |
---|---|---|---|---|---|---|
RoBERTArg | 0.8193 | 0.8021 | 0.8463 | 0.7986 | 0.7623 | 0.8719 |
Showing the confusion matrix using again the evaluation set:
ARGUMENT | NON-ARGUMENT | |
---|---|---|
ARGUMENT | 2213 | 558 |
NON-ARGUMENT | 325 | 1790 |
β οΈ Intended Uses & Potential Limitations
The model can only be a starting point to dive into the exciting field of argument mining. But be aware. An argument is a complex structure, with multiple dependencies. Therefore, the model may perform less well on different topics and text types not included in the training set.
Enjoy and stay tuned! π
π¦ Twitter: @chklamm