Pre-trained BERT on Twitter US Political Election 2020
Pre-trained weights for PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter, LREC 2022.
Please see the official repository for more detail.
We use the initialized weights from BERTweet or vinai/bertweet-base
.
Training Data
This model is pre-trained on over 83 million English tweets about the 2020 US Presidential Election.
Training Objective
This model is initialized with BERTweet and trained with an MLM objective.
Usage
This pre-trained language model can be fine-tunned to any downstream task (e.g. classification).
from transformers import AutoModel, AutoTokenizer, pipeline
import torch
# choose GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# select mode path here
pretrained_LM_path = "kornosk/polibertweet-mlm"
# load model
tokenizer = AutoTokenizer.from_pretrained(pretrained_LM_path)
model = AutoModel.from_pretrained(pretrained_LM_path)
# fill mask
example = "Trump is the <mask> of USA"
fill_mask = pipeline('fill-mask', model=pretrained_LM_path, tokenizer=tokenizer)
outputs = fill_mask(example)
print(outputs)
# see embeddings
inputs = tokenizer(example, return_tensors="pt")
outputs = model(**inputs)
print(outputs)
# OR you can use this model to train on your downstream task!
# please consider citing our paper if you feel this is useful :)
Reference
Citation
@inproceedings{kawintiranon2022polibertweet,
title = {PoliBERTweet: A Pre-trained Language Model for Analyzing Political Content on Twitter},
author = {Kawintiranon, Kornraphop and Singh, Lisa},
booktitle = {Proceedings of the Language Resources and Evaluation Conference},
year = {2022},
publisher = {European Language Resources Association}
}
- Downloads last month
- 337
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.