Silly-Machine
/

TuPy-Bert-Base-Binary-Classifier

+---
+license: mit
+datasets:
+- Silly-Machine/TuPyE-Dataset
+language:
+- pt
+pipeline_tag: text-classification
+base_model: neuralmind/bert-base-portuguese-cased
+widget:
+- text: 'Bom dia, flor do dia!!'
+model-index:
+  - name: Yi-34B
+    results:
+      - task:
+          type: text-classfication
+        dataset:
+          name: TuPyE-Dataset
+          type: Silly-Machine/TuPyE-Dataset
+        metrics:
+          - type: f1
+            value: 0.84
+            name: F1-score
+            verified: true
+          - type: precision
+            value: 0.85
+            name: Precision
+            verified: true
+          - type: recall
+            value: 0.84
+            name: Recall
+            verified: true
+---
+## Introduction
+Tupy-BERT-Base-Multilabel is a fine-tuned BERT model designed specifically for multilabel classification of hate speech in Portuguese.
+Derived from the [BERTimbau base](https://huggingface.co/neuralmind/bert-base-portuguese-cased),
+TuPy-Base is a refined solution for addressing categorical hate speech concerns (ageism, aporophobia, body shame, capacitism, LGBTphobia, political,
+racism, religious intolerance, misogyny, and xenophobia).
+For more details or specific inquiries, please refer to the [BERTimbau repository](https://github.com/neuralmind-ai/portuguese-bert/).
+The efficacy of Language Models can exhibit notable variations when confronted with a shift in domain between training and test data.
+In the creation of a specialized Portuguese Language Model tailored for hate speech classification,
+the original BERTimbau model underwent fine-tuning processe carried out on
+the [TuPy Hate Speech DataSet](https://huggingface.co/datasets/Silly-Machine/TuPyE-Dataset), sourced from diverse social networks.
+## Available models
+| Model                                    | Arch.      | #Layers | #Params |
+| ---------------------------------------- | ---------- | ------- | ------- |
+| `Silly-Machine/TuPy-Bert-Base-Binary-Classifier`  | BERT-Base	|12	|109M|
+| `Silly-Machine/TuPy-Bert-Large-Binary-Classifier` | BERT-Large | 24      | 334M    |
+| `Silly-Machine/TuPy-Bert-Base-Multilabel` | BERT-Base | 12      | 109M    |
+| `Silly-Machine/TuPy-Bert-Large-Multilabel` | BERT-Large | 24      | 334M    |
+## Example usage
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
+import torch
+import numpy as np
+from scipy.special import softmax
+def classify_hate_speech(model_name, text):
+    model = AutoModelForSequenceClassification.from_pretrained(model_name)
+    tokenizer = AutoTokenizer.from_pretrained(model_name)
+    config = AutoConfig.from_pretrained(model_name)
+    # Tokenize input text and prepare model input
+    model_input = tokenizer(text, padding=True, return_tensors="pt")
+    # Get model output scores
+    with torch.no_grad():
+        output = model(**model_input)
+        scores = softmax(output.logits.numpy(), axis=1)
+        ranking = np.argsort(scores[0])[::-1]
+    # Print the results
+    for i, rank in enumerate(ranking):
+        label = config.id2label[rank]
+        score = scores[0, rank]
+        print(f"{i + 1}) Label: {label} Score: {score:.4f}")
+# Example usage
+model_name = "Silly-Machine/TuPy-Bert-Base-Multilabel"
+text = "Bom dia, flor do dia!!"
+classify_hate_speech(model_name, text)
+```