tweet-topic-large-multilingual
This model is based on cardiffnlp/twitter-xlm-roberta-large-2022 language model and isfinetuned for multi-label topic classification in English, Spanish, Japanese, and Greek.
The models is trained using TweetTopic and X-Topic datasets (see main EMNLP 2024 reference paper.
Labels:
0: arts_&_culture | 5: fashion_&_style | 10: learning_&_educational | 15: science_&_technology |
---|---|---|---|
1: business_&_entrepreneurs | 6: film_tv_&_video | 11: music | 16: sports |
2: celebrity_&_pop_culture | 7: fitness_&_health | 12: news_&_social_concern | 17: travel_&_adventure |
3: diaries_&_daily_life | 8: food_&_dining | 13: other_hobbies | 18: youth_&_student_life |
4: family | 9: gaming | 14: relationships |
Full classification example
from transformers import AutoModelForSequenceClassification, TFAutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
from scipy.special import expit
MODEL = f"cardiffnlp/tweet-topic-large-multilingual"
tokenizer = AutoTokenizer.from_pretrained(MODEL)
# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
class_mapping = model.config.id2label
text = "It is great to see athletes promoting awareness for climate change."
tokens = tokenizer(text, return_tensors='pt')
output = model(**tokens)
scores = output[0][0].detach().numpy()
scores = expit(scores)
predictions = (scores >= 0.5) * 1
# TF
#tf_model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
#class_mapping = tf_model.config.id2label
#text = "It is great to see athletes promoting awareness for climate change."
#tokens = tokenizer(text, return_tensors='tf')
#output = tf_model(**tokens)
#scores = output[0][0]
#scores = expit(scores)
#predictions = (scores >= 0.5) * 1
# Map to classes
for i in range(len(predictions)):
if predictions[i]:
print(class_mapping[i])
Output:
news_&_social_concern
sports
Results on X-Topic
English | Spanish | Japanese | Greek | |
---|---|---|---|---|
Macro-F1 | 60.2 | 52.9 | 57.3 | 50.3 |
Micro-F1 | 66.3 | 67.0 | 61.4 | 73.0 |
BibTeX entry and citation info
TBA
- Downloads last month
- 759