Karim-Gamal
/

BERT-base-finetuned-emojis-IID-Fed

Text Classification

Inference Endpoints

Model card Files Files and versions Community

Karim-Gamal commited on Mar 26, 2023

Commit

f968140

·

1 Parent(s): c7cadcc

Update README.md

Files changed (1) hide show

README.md +23 -1

README.md CHANGED Viewed

@@ -35,9 +35,14 @@ pip install transformers
 ```
 > Then, you can load the model and tokenizer using the following code:
-```pyhton
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
 import numpy as np
 MODEL = "Karim-Gamal/BERT-base-finetuned-emojis-IID-Fed"
 tokenizer = AutoTokenizer.from_pretrained(MODEL)
 model = AutoModelForSequenceClassification.from_pretrained(MODEL)
@@ -46,6 +51,15 @@ model = AutoModelForSequenceClassification.from_pretrained(MODEL)
 > Once you have the tokenizer and model, you can preprocess your text and pass it to the model for prediction:
 ```python
 text = "Hello world"
 text = preprocess(text)
 encoded_input = tokenizer(text, return_tensors='pt')
@@ -56,6 +70,14 @@ scores = output[0][0].detach().numpy()
 > The scores variable contains the probabilities for each of the possible emoji labels. To get the top k predictions, you can use the following code:
 ```python
 k = 3 # number of top predictions to show
 ranking = np.argsort(scores)
 ranking = ranking[::-1]

 ```
 > Then, you can load the model and tokenizer using the following code:
+```python
 from transformers import AutoModelForSequenceClassification, AutoTokenizer
 import numpy as np
+import urllib.request
+import csv
+```
+```python
 MODEL = "Karim-Gamal/BERT-base-finetuned-emojis-IID-Fed"
 tokenizer = AutoTokenizer.from_pretrained(MODEL)
 model = AutoModelForSequenceClassification.from_pretrained(MODEL)
 > Once you have the tokenizer and model, you can preprocess your text and pass it to the model for prediction:
 ```python
+# Preprocess text (username and link placeholders)
+def preprocess(text):
+    new_text = []
+    for t in text.split(" "):
+        t = '@user' if t.startswith('@') and len(t) > 1 else t
+        t = 'http' if t.startswith('http') else t
+        new_text.append(t)
+    return " ".join(new_text)
 text = "Hello world"
 text = preprocess(text)
 encoded_input = tokenizer(text, return_tensors='pt')
 > The scores variable contains the probabilities for each of the possible emoji labels. To get the top k predictions, you can use the following code:
 ```python
+# download label mapping
+labels=[]
+mapping_link = "https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/emoji/mapping.txt"
+with urllib.request.urlopen(mapping_link) as f:
+    html = f.read().decode('utf-8').split("\n")
+    csvreader = csv.reader(html, delimiter='\t')
+labels = [row[1] for row in csvreader if len(row) > 1]
 k = 3 # number of top predictions to show
 ranking = np.argsort(scores)
 ranking = ranking[::-1]