metadata

pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - feature-extraction
  - sentence-similarity
  - transformers
  - argument-mining
  - Twitter
metrics:
  - macro-F1
license: cc-by-sa-4.0
language:
  - en
widget:
  - source_sentence: >-
      The formula: Not everyone who voted Leave is racist. But everyone who's
      racist voted Leave. Not everyone who voted Leave is thick. But everyone
      who's thick voted Leave. The thick racists therefore called the shots,
      whatever the thoughts of the minority of others. #thick #Brexit
    sentences:
      - 'Men shouldn’t be making laws about women’s bodies #abortion #Texas'
      - >-
        Opinion: As the draconian (and then some) abortion law takes effecting
        #Texas, this is not an idle question for millions of Americans. A
        slippery slope towards more like-minded Republican state-legislatures to
        try to follow suit. #abortion #F24 HTTPURL
      - >-
        ’Bitter truth’: EU chief pours cold water on idea of Brits keeping EU
        citizenship after #Brexit HTTPURL via @USER
      - '@USER Blah blah blah blah blah blah'
    example_title: Reason
  - source_sentence: This is NOT good for children.
    sentences:
      - 'Men shouldn’t be making laws about women’s bodies #abortion #Texas'
      - >-
        Opinion: As the draconian (and then some) abortion law takes effecting
        #Texas, this is not an idle question for millions of Americans. A
        slippery slope towards more like-minded Republican state-legislatures to
        try to follow suit. #abortion #F24 HTTPURL
      - >-
        ’Bitter truth’: EU chief pours cold water on idea of Brits keeping EU
        citizenship after #Brexit HTTPURL via @USER
      - '@USER Blah blah blah blah blah blah'
    example_title: Statement
  - source_sentence: >-
      Elon Musk ready with 'Plan B' if Twitter rejects his offer  Read @USER
      Story | HTTPURL #ElonMusk #ElonMuskTwitter #TwitterTakeover HTTPURL
    sentences:
      - 'Men shouldn’t be making laws about women’s bodies #abortion #Texas'
      - >-
        Opinion: As the draconian (and then some) abortion law takes effecting
        #Texas, this is not an idle question for millions of Americans. A
        slippery slope towards more like-minded Republican state-legislatures to
        try to follow suit. #abortion #F24 HTTPURL
      - >-
        ’Bitter truth’: EU chief pours cold water on idea of Brits keeping EU
        citizenship after #Brexit HTTPURL via @USER
      - '@USER Blah blah blah blah blah blah'
    example_title: Notification
  - source_sentence: '@USER 👅is the Key 😂'
    sentences:
      - 'Men shouldn’t be making laws about women’s bodies #abortion #Texas'
      - >-
        Opinion: As the draconian (and then some) abortion law takes effecting
        #Texas, this is not an idle question for millions of Americans. A
        slippery slope towards more like-minded Republican state-legislatures to
        try to follow suit. #abortion #F24 HTTPURL
      - >-
        ’Bitter truth’: EU chief pours cold water on idea of Brits keeping EU
        citizenship after #Brexit HTTPURL via @USER
      - '@USER Blah blah blah blah blah blah'
    example_title: None

WRAPresentations

Introducing WRAPresentations, a cutting-edge sentence-transformers model that leverages the power of a 768-dimensional dense vector space to map tweets according to the four classes Reason, Statement, Notification and None. This powerful model is tailored for argument mining on Twitter, derived from the BERTweet-base architecture initially pre-trained on Twitter data. Through fine-tuning with the TACO dataset, WRAPresentations is effectively in Weaving Relevant Argument Properties (WRAP) into the embedding space.

Class Semantics

WRAPresentations, to some degree, captures the semantics of the critical components of an argument (inference and information), as defined by the Cambridge Dictionary. It encodes inference as a guess that you make or an opinion that you form based on the information that you have, and it also leverages the definition of information as facts or details about a person, company, product, etc..

Consequently, it has also learned the semantics of:

Statement, which refers to unique cases where only the inference is presented as something that someone says or writes officially, or an action done to express an opinion.
Reason, which represents a full argument where the inference is based on direct information mentioned in the tweet, such as a source-reference or quotation, and thus reveals the author’s motivation to try to understand and to make judgments based on practical facts.
Notification, which refers to a tweet that limits itself to providing information, such as media channels promoting their latest articles.
None, a tweet that provides neither inference nor information.

In its entirety, WRAPresentations encodes the following hierarchy for tweets:

Usage (Sentence-Transformers)

Using this model becomes easy when you have sentence-transformers installed:

pip install -U sentence-transformers

Then you can use the model to generate tweet representations like this:

from sentence_transformers import SentenceTransformer

tweets = ["This is an example #tweet", "Each tweet is converted"]

model = SentenceTransformer("TomatenMarc/WRAPresentations")
embeddings = model.encode(tweets)
print(embeddings)

Notice: The tweets need to undergo preprocessing following the specifications for BERTweet-base.

Usage (HuggingFace Transformers)

Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on top of the contextualized word embeddings.

from transformers import AutoTokenizer, AutoModel
import torch


# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]  # First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Tweets we want embeddings for
tweets = ["This is an example #tweet", "Each tweet is converted"]

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained("TomatenMarc/WRAPresentations")
model = AutoModel.from_pretrained("TomatenMarc/WRAPresentations")

# Tokenize sentences
encoded_input = tokenizer(tweets, padding=True, truncation=True, return_tensors="pt")

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, mean pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input["attention_mask"])

print("Sentence embeddings:")
print(sentence_embeddings)

Furthermore, the WRAPresentations model is a highly suitable embedding component for AutoModelForSequenceClassification, enabling fine-tuning of tweet classification tasks specifically for the four classes: Reason, Statement, Notification, and None. The categorization of Reason and Statement as argument classes and Notification and None as non-argument classes is implicitly learned during the fine-tuning process. This setup facilitates efficient identification and analysis of argumentative content and non-argumentative content in tweets.

Training

The WRAPresentations model underwent fine-tuning with 1,219 golden tweets from the TACO dataset, covering six topics. Five topics were chosen for optimization, representing 925 tweets (75.88%) covering #brexit (33.3%), #got (17%), #lotrrop (18.8%), #squidgame (17.1%), and #twittertakeover (13.8%). The model used a stratified 60/40 split for training/testing on optimization data. Additionally, 294 golden tweets (24.12%) related to the topic of #abortion were chosen as the holdout-set for final evaluation.

Before fine-tuning, we built a copy of the dataset by creating an augmentation of each tweet. The augmentation consisted of replacing all the topic words and entities in a tweet replaced, and then randomly masking 10% of the words in a tweet, which were then matched using BERTweet-base as a fill-mask model. We chose to omit 10% of the words because this resulted in the smallest possible average cosine distance between the tweets and their augmentations of 0.08, which is close to total dissimilarity, making augmentation during extended pre-training itself a regulating factor prior to any overfitting with the later test data. During fine-tuning, we formed pairs by matching each tweet with all remaining tweets in the same data split (training, testing, holdout) with similar or dissimilar class labels. For the training and testing set during the fine-tuning process, we utilized the augmentations, and for the holdout tweets, we used their original text to test the fine-tuning process and the usefulness of the augmentations towards real tweets. For all pairs, we chose the largest possible set so that both similar and dissimilar pairs are equally represented while covering all tweets of the respective data split. This process created 307,470 pairs for training and 136,530 pairs for testing. An additional 86,142 pairs were used for final evaluation with the holdout data.

The model was trained with the parameters:

DataLoader:

torch.utils.data.dataloader.DataLoader of length 5065 with parameters:

{'batch_size': 32, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}

Loss:

sentence_transformers.losses.ContrastiveLoss.ContrastiveLoss with parameters:

{'distance_metric': 'SiameseDistanceMetric.COSINE_DISTANCE', 'margin': 0.5, 'size_average': True}

Parameters of the fit()-Method:

{
    "epochs": 5,
    "evaluation_steps": 1000,
    "evaluator": "sentence_transformers.evaluation.BinaryClassificationEvaluator.BinaryClassificationEvaluator",
    "max_grad_norm": 1,
    "optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
    "optimizer_params": {
        "lr": 4e-05
    },
    "scheduler": "WarmupLinear",
    "steps_per_epoch": null,
    "warmup_steps": 2533,
    "weight_decay": 0.01
}

Evaluation Results

Following the standard protocol for cross-topic evaluation for argument mining, we evaluated the WRAPresentation model using the BinaryClassificationEvaluator showing:

Model	Accuracy	Precision	Recall	F1	Support
vinai/bertweet-base	60.62%	50.08%	99.89%	66.71%	86,142
WRAPresentations	71.32%	66.22%	84.05%	74.08%	86,142

An evaluation was conducted on previously unseen data from the holdout topic #abortion, resulting in the model achieving a sophisticated macro-F1 score of 74.08%. The recall, which stands at 84.05%, indicates the model's ability to capture subtle tweet patterns and class-specific features for Reason, Statement, Notification, and None. Despite having a lower precision of 66.22%, the model's primary focus is on prioritizing recall to capture relevant instances. Fine-tuning precision can be addressed in a subsequent classification phase, when using this model for AutoModelForSequenceClassification. In contrast, the baseline model (vinai/bertweet-base) achieved an exceptional recall of 99.89%, but it comes with a precision trade-off (50.08%), possibly indicating overfitting. However, WRAPresentations demonstrated its ability to effectively distinguish between tweets of the argument framework, capturing intra-class semantics while discerning inter-class semantics. This is indicated by its better F1 score of 74.08%, showcasing a superior balance between recall and precision. As a result, WRAPresentations proves to be more suitable for argument mining on Twitter, as it achieves a more reliable performance in identifying relevant instances in the data.

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: RobertaModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
)

Environmental Impact

Hardware Type: A100 PCIe 40GB
Hours used: 2h
Cloud Provider: Google Cloud Platform
Compute Region: asia-southeast1 (Singapore)
Carbon Emitted: 0.21kg CO2

TomatenMarc
/

WRAPresentations