rabuahmad
/

tweet-style-classifier-de

Text Classification

Model card Files Files and versions Community

rabuahmad commited on Aug 29

Commit

1d80146

•

1 Parent(s): bcfc408

Update README.md

Files changed (1) hide show

README.md +52 -3

README.md CHANGED Viewed

@@ -1,3 +1,52 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- Alienmaster/SB10k
+- cardiffnlp/tweet_sentiment_multilingual
+- legacy-datasets/wikipedia
+- community-datasets/gnad10
+language:
+- de
+base_model: dbmdz/bert-base-german-uncased
+pipeline_tag: text-classification
+---
+## Tweet Style Classifier (German)
+This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether a German text is a tweet or not.
+The dataset contained about 20K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing.
+The NVIDIA RTX A6000 GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer.
+The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets.
+### How to use
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline
+model_name = "rabuahmad/tweet-style-classifier-de"
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512)
+classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512)
+text = "Gestern war ein schöner Tag!"
+result = classifier(text)
+```
+Label 1 indicates that the text is predicted to be a tweet.
+### Evaluation
+Evaluation results on the test set:
+| Metric   |Score      |
+|----------|-----------|
+| Accuracy | 0.99988   |
+| Precision| 0.99901   |
+| Recall   | 0.99901   |
+| F1       | 0.99901   |