tsdocode's picture
Update README.md
ba30492
metadata
language:
  - vi
tags:
  - classification
widget:
  - text: Xấu vcl
    example_title: Công kích
  - text: Đồ ngu
    example_title: Thù ghét
  - text: Xin chào chúc một ngày tốt lành
    example_title: Normal

PhoBert finetuned version for hate speech detection

Dataset

  • VLSP2019: Hate Speech Detection on Social Networks Dataset
  • ViHSD: Vietnamese Hate Speech Detection dataset

Class name

  • LABEL_0 : Normal
  • LABEL_1 : OFFENSIVE
  • LABEL_2 : HATE

Usage example with TextClassificationPipeline

from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline


model = AutoModelForSequenceClassification.from_pretrained("tsdocode/phobert-finetune-hatespeech", num_labels=3)
tokenizer = AutoTokenizer.from_pretrained("tsdocode/phobert-finetune-hatespeech")


pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True)
# outputs a list of dicts like [[{'label': 'NEGATIVE', 'score': 0.0001223755971295759},  {'label': 'POSITIVE', 'score': 0.9998776316642761}]]
pipe("đồ ngu")